Page MenuHomePhabricator

[BasicAA] Fix caching in the presence of phi cycles

Authored by nikic on Oct 23 2020, 12:01 PM.



Any time we insert a block into VisitedPhiBBs, previously cached values may no longer be valid for the recursive alias queries. As such, perform them using an empty AAQueryInfo. (Note that if we recurse to the same phi, the block will already be inserted, so we reuse the old AAQueryInfo, and thus still protect against infinite recursion.)

This is a different approach to fixing this problem than D90062, I'm not sure what the tradeoffs are yet.

Diff Detail

Event Timeline

nikic created this revision.Oct 23 2020, 12:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 23 2020, 12:01 PM
nikic requested review of this revision.Oct 23 2020, 12:01 PM

Thank you for this patch, I think this is a better general direction.

  1. Did you happen to test the compile-time impact for this? (it's a correctness fix, and needed, but it would be good to note on what to expect)
  1. I think the IsCapturedCache cache can be reused in the new AAQI. Ideally the new AAQI would use a reference to it, not a copy, which would need a bit of a re-write but may be worth as far as performance.
  1. Could you add to the test something like this (this was defeating my patch):
BatchAAResults BatchAA2(AA);
EXPECT_EQ(NoAlias, BatchAA2.alias(A1Loc, A2Loc));
EXPECT_EQ(MayAlias, BatchAA2.alias(S1Loc, S2Loc));
EXPECT_EQ(MayAlias, BatchAA2.alias(PhiLoc, A1Loc));

Looking further to see if there's anything I missed.

Here are the compile-time numbers for this patch: The impact actually seems pretty low, so I think this is a viable approach.

nikic updated this revision to Diff 300406.Oct 23 2020, 2:09 PM

Add additional BatchAA test with different query order.

nikic added a comment.Oct 23 2020, 2:27 PM

I've now added the additional test.

Regarding the IsCapturedCache, I agree that this is safe to reuse. I thought we might get away with just temporarily swapping out the AliasCache and keeping the rest (, but it turns out this has worse performance impact ( Probably because swapping out a SmallDenseSet of MemoryLocation pairs is expensive -- these are large structures.

For efficient reuse we'd want to convert the capture cache into a pointer that either points into an owned cache in AAQueryInfo, or into a separate cache. However, that introduces a self-reference, so we'd also have to deal with custom move/copy constructors/assignments. Given that the impact doesn't seem so large, I'm not sure it's worth the bother.

nikic added a comment.Oct 23 2020, 3:27 PM

Here's how retaining the capture cache could look like (on top of this patch):
Doing this does improve things a little bit: (The small tramp3d-v4 ReleaseThinLTO regression is no longer present.)
But maybe not enough to justify the complexity.

asbirlea accepted this revision.Oct 23 2020, 3:42 PM

Thank you for the numbers and the additional variants. I agree it's good to move forward with this approach as is.

This revision is now accepted and ready to land.Oct 23 2020, 3:42 PM
This revision was automatically updated to reflect the committed changes.