This is an archive of the discontinued LLVM Phabricator instance.

[MemoryDependencyAnalysis] Delete cache infos if CacheInfo->size != Loc.size
Needs ReviewPublic

Authored by StephenFan on Jul 2 2023, 11:51 PM.

Details

Summary

In the past, if the size value of CacheInfo was greater than Loc's size
value, the CacheInfo would not be dropped and we repalced Loc's size
with CacheInfo's size to compute the dependencies. However it caused two
problems:

  1. With a greater Loc size, we might get a imprecise result which shows in test GVN/PRE/rle.ll.

2.If we have a pointer value that can be translated to a block twice(%array.2.phi.2 in pr53599.ll), using a greater size may cause dependencies of smaller size been added to greater size's CacheInfo. And then some non-redundant load instructions would be deleted.

Compile time differences:https://llvm-compile-time-tracker.com/compare.php?from=18eb31c096167a1f6185f05338494c82a509ff9e&to=65fba50ce274460ad88f90e0ae29d36869e42d71&stat=instructions:u

Fixes: https://github.com/llvm/llvm-project/issues/63559 and https://github.com/llvm/llvm-project/issues/63611

Diff Detail

Event Timeline

StephenFan created this revision.Jul 2 2023, 11:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2023, 11:51 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
StephenFan requested review of this revision.Jul 2 2023, 11:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2023, 11:51 PM
StephenFan updated this revision to Diff 536687.Jul 3 2023, 1:52 AM
StephenFan edited the summary of this revision. (Show Details)

Also fixes another issue.

nikic added a comment.Jul 3 2023, 1:33 PM

2.If we have a pointer value that can be translated to a block twice(%array.2.phi.2 in pr53599.ll), using a greater size may cause dependencies of smaller size been added to greater size's CacheInfo. And then some non-redundant load instructions would be deleted.

Could you please explain this part in more detail? I don't really get why working with the greater size is not conservatively correct.

2.If we have a pointer value that can be translated to a block twice(%array.2.phi.2 in pr53599.ll), using a greater size may cause dependencies of smaller size been added to greater size's CacheInfo. And then some non-redundant load instructions would be deleted.

Could you please explain this part in more detail? I don't really get why working with the greater size is not conservatively correct.

Consider this example:

entry:
  %array.2 = ...

header:
  %array.2.phi.1 = phi ptr [ %array.2, %entry ], [ %array.2.phi.2, %latch ]
  br ..., label %merge, label %then_block

then_block:
  %array.2.remat = ...
  br label %merge

merge:
  %array.2.phi.2 = phi ptr [ %array.2.phi.1, %header ], [ %array.2.remat, %then_block]
  br ..., label %iter.peel, label %side_exit

iter.peel:
  %clobber_array.2.phi.2_with_size_8 = ...
  %clobber_array.2.phi.2_with_size_16 = ...
  br ..., label %latch, label %side_exit

latch:
  %clobber_array.2.phi.2 = ...
  br i1 %test_i5, label %return, label %header

side_exit:
  %val.array.2.1 = load atomic i64, ptr %array.2.phi.2 unordered, align 8

First of all, assume there is only one NonLocalDefsCache entry which its key is
(%array.2.phi.1, true), and the size is 16

  1. we want to get the clobber of instruction %val.array.2.1 = load atomic i64, ptr %array.2.phi.2 unordered, align 8 , so we need to get %array.2.phi.2's pointer dependency. According to the implementation, before the query starts, we would insert an entry to NonLocalPointerDeps which is (%array.2.phi.2, true) and with size 8. The predecessors of block %side_exit are %merge and %iter.peel. These two blocks would be added to the worklist.
  2. We first choose %merge block, there is no dependency in %merge, so we do PhiTranslation for %array.2.phi.2. For (%array.2.remat, %then_block), nothing happens. For (%array.2.phi.1, %header), there is a cache entry in NonLocalDefsCache with size 16. But currently, the Loc's size is 8. According to the implementation, we will redo the query with cache's size, which is 16.
  3. At this point, we were querying (%array.2.phi.1, %header) with size 16. At the %header block, there is no any dependency. So then we do PhiTranslation for %array.2.phi.1. For (%array.2, %entry), nothing happens. For (%array.2.phi.2, %latch), since in Step 1, we have inserted a cache entry with size 8. According to the implementation, we will clear the cache entry and set the size from 8 to 16. then we find a clobber which is %clobber_array.2.phi.2. The query returns since we have found a clobber.
  4. But according to Step 1, at this point, we just finished the query of %merge. We still need to search to %iter.peel And unfortunately, we thought the cache entry size of %array.2.phi.2 is 8, but actually, it has been set to 16 in Step 3. And because the Loc size is still 8, we would only find %clobber_array.2.phi.2_with_size_8. Then we add this clobber info (%iter.peel, %clobber_array.2.phi.2_with_size_8) to the cache. Okay, then the error has existed since we query a clobber with size 8, but added to a size 16 cache.
nikic added a comment.Jul 25 2023, 6:41 AM

A concern I have is: Can the same also happen with AATags and not just Size? The logic there is very similar. And while just clearing the cache entry for Size mismatch is probably okay, I don't think this would work for AATags (which will often differ).