This is an archive of the discontinued LLVM Phabricator instance.

[ValueTracking] getUnderlyingObject() look through phi.
AbandonedPublic

Authored by caojoshua on Jun 3 2023, 2:36 PM.

Details

Summary

For a pointer phi, we can strenghten the underlying object if each of
its incoming pointers are the same, or the PHI itself.

Some benefits are:

  • Stronger alias analysis
  • Folding phis that point to a zero initialized constant array

Diff Detail

Event Timeline

caojoshua created this revision.Jun 3 2023, 2:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2023, 2:36 PM
caojoshua updated this revision to Diff 528164.Jun 3 2023, 4:29 PM
caojoshua edited the summary of this revision. (Show Details)

small updates

caojoshua updated this revision to Diff 528166.Jun 3 2023, 4:31 PM

small update

I think this analysis of objects is correct. Would appreciate if reviewers let me know whether my understanding is correct or not. This change is on top of a local NFC to make getUnderlyingObject() recursive.

Some caveats:

  • With unbounded lookups, this could increase compile times since this is a frequently used API. However, there is a default max lookup of 6. I only see one caller that uses a different value of 0, for infinite max lookups. I think overall compile time should see minimal impact.
  • This change unintentionally made a bunch of changes to codegen tests. The tests with small changes look fine to me, but I'm unsure about @wrongUseOfPostDominate. I have not work with LLVM CodeGen before, so I don't want to dive too deep into this test in case the idea in this patch is completely wrong. I'd like to get a first round of feedback before I look into it.
llvm/test/Transforms/LoopVectorize/ARM/pointer_iv.ll
884

Is this transformation legal? I believe it is. I tried putting this in alive2, but compilation is failing without any output.. I tried an example in godbolt which is functionally the same, except we GEP directly from the array base. The generated code in that case eliminates the bounds check, so I think it should be fine here as well.

Although the test goes through multiple passes, the removal of the bounds check is occurring during loop-vectorize, and I'm not familiar with the pass. It's not clear to me how this patch is affecting anything in loop-vectorize.

caojoshua published this revision for review.Jun 3 2023, 4:45 PM
goldstein.w.n added inline comments.Jun 3 2023, 5:22 PM
llvm/lib/Analysis/ValueTracking.cpp
5647

How often is the cache hit?

5691

Shouldn't V startout as PHI->getIncomingValue(0)?

caojoshua updated this revision to Diff 528187.Jun 3 2023, 10:26 PM
caojoshua edited the summary of this revision. (Show Details)

Logic was not quite right. After updating, a bunch more tests are changed.

caojoshua added inline comments.Jun 3 2023, 10:34 PM
llvm/lib/Analysis/ValueTracking.cpp
5647

Not much with default MaxLookup = 6.

The reason I add this is because with looking through loop PHI's, we will inifinite loop. I don't actually run into this in the test suite, but if I hardcode MaxLookup to default to 0, we infinite loop in 11 InstCombine tests.

5691

I changed the logic so that we initially set NewUnderlying to getIncomingValue(0). This also requires that we check the phi has >0 arguments, or else some JumpThreading tests fail. Its functionally the same.

llvm/test/CodeGen/AMDGPU/promote-alloca-to-lds-phi.ll
131

I don't understand AMDGPU code, but I believe we solve the FIXME by promoting the alloca.

llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll
859

Similar to ARM/iv_pointer.ll changes. We remove the bounds checks.

nikic requested changes to this revision.Jun 4 2023, 12:13 AM

If you upload the base revision I can check, but I'm pretty confident that this is not viable from a compile-time perspective. getUnderlyingObject() is very performance critical, and we rely on the fact that this operation is "essentially free". We have a separate getUnderlyingObjects() function that can recurse though selects and phis.

This revision now requires changes to proceed.Jun 4 2023, 12:13 AM
caojoshua updated this revision to Diff 528192.Jun 4 2023, 2:00 AM

Capture Count variable by reference, and increment it once each time Visit() is called.

This hopefully alleviates some compile-time concerns. When looking through PHIs, each operand will increment the same Count.

If you upload the base revision I can check, but I'm pretty confident that this is not viable from a compile-time perspective. getUnderlyingObject() is very performance critical, and we rely on the fact that this operation is "essentially free". We have a separate getUnderlyingObjects() function that can recurse though selects and phis.

Please check if you are able to. I have uploaded to https://github.com/caojoshua/llvm-project/tree/underlyingphi.

My hope is that with the default MaxLookup = 6, the compile time does not see significant impact. I just uploaded some new changes that hopefully makes compile time better.

nikic added a comment.Jun 4 2023, 3:30 AM

Results: https://llvm-compile-time-tracker.com/?config=NewPM-O3&stat=instructions%3Au&branch=nikic/perf/underlyingphi

As expected, a major regression. I've left some suggestions for possible improvements.

llvm/lib/Analysis/ValueTracking.cpp
5648

SmallDenseMap

5650

Don't use std::function

5653

Only use Visited on the phi path.

caojoshua abandoned this revision.Jun 4 2023, 10:10 AM

Results: https://llvm-compile-time-tracker.com/?config=NewPM-O3&stat=instructions%3Au&branch=nikic/perf/underlyingphi

As expected, a major regression. I've left some suggestions for possible improvements.

The NFC patch itself creates a regression. I don't think the recursion is an issue, since I see on my local build the tail recursion is eliminated. I suspect removing the std::function might be sufficient to eliminate regression.

For the main patch, it looks unrealistic to eliminate compile-time regression. I think this functionality could be useful for some callers, maybe it would be good to create a different code path that does not touch the existing getUnderlyingObject.

I tried reducing the compile time in https://github.com/caojoshua/llvm-project/tree/underlyingperf. From local results, I'm able to eliminate regressions in the NFC by moving away from std::function. Regressions are still very significant for the functionality changing patch.