This is an archive of the discontinued LLVM Phabricator instance.

[ValueTracking] Remove MaxLookup from getUnderlyingObjects
Changes PlannedPublic

Authored by vitalybuka on Aug 26 2020, 6:41 PM.

Details

Summary

Support long chains of instruction where getUnderlyingObject fails.
getUnderlyingObjects already traverse large set on instructions so
there is no reason to limit some sequences by MaxLookup.

getUnderlyingObject needs MaxLookup to avoid infinit loops on cycles.

getUnderlyingObjects uses Visited so it can't get into infinit loops
if it calls getUnderlyingObject with MaxLookup > 0.

So we can continiosly repeate getUnderlyingObject updating Visited and
return all underlying Values.

Diff Detail

Event Timeline

vitalybuka created this revision.Aug 26 2020, 6:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 26 2020, 6:41 PM
vitalybuka requested review of this revision.Aug 26 2020, 6:41 PM
vitalybuka updated this revision to Diff 288153.

typo

so there is no reason to limit some sequences by MaxLookup.

Except avoiding runaway compile time in some degenerate cases possibly?

so there is no reason to limit some sequences by MaxLookup.

Except avoiding runaway compile time in some degenerate cases possibly?

With the current approach it's easy to make similar degenerate case with selects or phi nodes.
It was O(N) and still O(N) of number instructions.
MaxLookup makes getUnderlyingObject O(1), but it does not make getUnderlyingObjects.

I think this would benefit from adding the testcase[s] for the pass that uses this utility that showcase the improvements here.

Oh, so wait, we need this for correctness even?
That doesn't look good on the LAA's side.

fhahn added a comment.Jun 23 2021, 8:13 AM

getUnderlyingObjects already traverse large set on instructions so
there is no reason to limit some sequences by MaxLookup.

I am not sure I completely follow this reasoning. Even if it already traverses a large number of instructions, without the depth limit it will visit even more? Do you have any estimate on the impact in terms of extra work?

llvm/test/Analysis/LoopAccessAnalysis/underlying-objects-2.ll
96

Not sure if it is really incorrect, but it is certainly a bit misleading; we will create a runtime check between gepB_plus_one and gepB9, which will never succeed I think.

The result with the patch is certainly an improvement.

getUnderlyingObjects already traverse large set on instructions so
there is no reason to limit some sequences by MaxLookup.

I am not sure I completely follow this reasoning. Even if it already traverses a large number of instructions, without the depth limit it will visit even more? Do you have any estimate on the impact in terms of extra work?

On real program almost always both version will do the same work and default MaxLookup will not be reached.

However there are possible two edge-cases with large number of nodes:
a) Long chain of N values:

a = b, b = c, c = d .... = underlying_object

b) Wide graph with total N values:

a = (a, b, c ....) , a = (a1, a2, ...), b = (b1, b2, ...)...

Existing algorithm will fail on "a" but exits in constant time, it may be able discover all objects on b (if it's not too deep), and but in O(N) time.
Patched algorithm discover all objects, in O(N) in both cases.

So my point is if we accept O(N) for b) why not accept O(N) for a) and let callers avoid thinking about MaxLookup.

Herald added a project: Restricted Project. · View Herald TranscriptJun 5 2022, 10:29 PM

We hit upon one test case (attached test.ll) blocked by this MaxLookup: the default value of MaxLookup is not enough to identify the underlying object, and the alias info added by AddAliasScopeMetadata() is wrong, resulting in wrong schedule decisions in machine-scheduler.

In function widget, pointer %tmp147 in below instruction is not correctly identified by getUnderlyingObjects()

%tmp148 = load i32, i32* %tmp147, align 4, !alias.scope !25, !noalias !26

The chain of pointers is below:

store [0 x %struct.ham.2]* %arg, [0 x %struct.ham.2]** %tmp, align 8
%tmp123 = load [0 x %struct.ham.2]*, [0 x %struct.ham.2]** %tmp, align 8
%tmp127 = bitcast [0 x %struct.ham.2]* %tmp123 to i8*
%tmp128 = getelementptr i8, i8* %tmp127, i64 %tmp126
%tmp136 = getelementptr inbounds i8, i8* %tmp128, i64 %tmp135
%tmp137 = bitcast i8* %tmp136 to [24 x i8]*
%tmp138 = bitcast [24 x i8]* %tmp137 to %struct.ham.2*
%tmp139 = getelementptr inbounds %struct.ham.2, %struct.ham.2* %tmp138, i32 0, i32 1
%tmp140 = bitcast [4 x %struct.spam.3]* %tmp139 to i8*
%tmp141 = getelementptr i8, i8* %tmp140, i64 -4
%tmp146 = getelementptr inbounds i8, i8* %tmp141, i64 %tmp145
%tmp147 = bitcast i8* %tmp146 to i32*

It looks like there is real program affected by this hard limit, and it produced wrong result. Can we think about this twice?

By the way, I used below command line to reproduce the issue.

opt --vector-library=MASSV -passes=default<O3> -aa-pipeline=default -mcpu=pwr9 -o test.err.bc test.ll

If you're seeing a miscompile, something is probably wrong with the caller. getUnderlyingObjects() is, in general, not guaranteed to produce an identifiable object. If the caller cares, it should check; for example, GlobalsAAResult::getModRefInfoForArgument checks all_of(Objects, isIdentifiedObject).

If you're seeing a miscompile, something is probably wrong with the caller. getUnderlyingObjects() is, in general, not guaranteed to produce an identifiable object. If the caller cares, it should check; for example, GlobalsAAResult::getModRefInfoForArgument checks all_of(Objects, isIdentifiedObject).

I am not sure why do don't want to make it guaranty with a patch like this?

If you're seeing a miscompile, something is probably wrong with the caller. getUnderlyingObjects() is, in general, not guaranteed to produce an identifiable object. If the caller cares, it should check; for example, GlobalsAAResult::getModRefInfoForArgument checks all_of(Objects, isIdentifiedObject).

I am not sure why do don't want to make it guaranty with a patch like this?

How could we possibly guarantee that? In general, we're going to find some opaque thing that both getUnderlyingObjects() and its caller can't understand.

I mean, I guess we could define an API that narrowly guarantees it looks though all bitcast, addrspacecast, gep, phi, and select operations. But that would be expensive in general, and it's not obvious to me it solves anything.

If you're seeing a miscompile, something is probably wrong with the caller. getUnderlyingObjects() is, in general, not guaranteed to produce an identifiable object. If the caller cares, it should check; for example, GlobalsAAResult::getModRefInfoForArgument checks all_of(Objects, isIdentifiedObject).

I am not sure why do don't want to make it guaranty with a patch like this?

How could we possibly guarantee that? In general, we're going to find some opaque thing that both getUnderlyingObjects() and its caller can't understand.

I mean, I guess we could define an API that narrowly guarantees it looks though all bitcast, addrspacecast, gep, phi, and select operations. But that would be expensive in general, and it's not obvious to me it solves anything.

Right, I totally forgot the point of the patch :)
I still think removing MaxLookup is good, and simplifies callers side.

If you're seeing a miscompile, something is probably wrong with the caller. getUnderlyingObjects() is, in general, not guaranteed to produce an identifiable object. If the caller cares, it should check; for example, GlobalsAAResult::getModRefInfoForArgument checks all_of(Objects, isIdentifiedObject).

Thank you for pointing out the direction. It seems the caller AddAliasScopeMetadata() is making the false assumption without additional check. I will dig a little bit into history, and maybe submit a patch to fix that.

lebedev.ri resigned from this revision.Jan 12 2023, 5:30 PM

This review may be stuck/dead, consider abandoning if no longer relevant.
Removing myself as reviewer in attempt to clean dashboard.

vitalybuka planned changes to this revision.May 24 2023, 5:25 PM