This is an archive of the discontinued LLVM Phabricator instance.

[DSE] Extending isOverwrite to support offsetted fully overlapping stores
ClosedPublic

Authored by fvrmatteo on Mar 1 2021, 3:50 AM.

Details

Summary

The isOverwrite function is making sure to identify if two stores are fully overlapping and ideally we would like to identify all the instances of OW_Complete as they'll yield possibly killable stores. The current implementation is incapable of spotting instances where the earlier store is offsetted compared to the later store, but still fully overlapped. The limitation seems to lie on the computation of the base pointers with the GetPointerBaseWithConstantOffset API that often yields different base pointers even if the stores are guaranteed to partially overlap (e.g. the alias analysis is returning AliasResult::PartialAlias).

The patch relies on the offsets computed and cached by BatchAAResults (available after https://reviews.llvm.org/D93529) to determine if the offsetted overlapping is OW_Complete.

Testing with ninja check-llvm didn't throw unexpected failures but I'll gladly update the patch and run more tests if this is something worth adding to the DSE pass.

Diff Detail

Event Timeline

fvrmatteo created this revision.Mar 1 2021, 3:50 AM
fvrmatteo requested review of this revision.Mar 1 2021, 3:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2021, 3:50 AM
nikic requested changes to this revision.Mar 1 2021, 4:07 AM
nikic added a subscriber: nikic.

Using scalar evolution for this purpose is definitely inappropriate. I would suggest to try basing this on D93529 instead.

This revision now requires changes to proceed.Mar 1 2021, 4:07 AM
fvrmatteo updated this revision to Diff 327172.Mar 1 2021, 10:23 AM
fvrmatteo added a subscriber: dfukalov.

@nikic I based it on the WIP patch you suggested and it works fine, so I updated the differential to reflect that it'll be doable without ScalarEvolution once the patch will be committed.

I noticed @dfukalov planned to fix the DSE after their patch, so I'm not sure if I should delete mine, as their fixes/improvements may be covering more cases compared to my isolated test file.

Is there official documentation on how to submit/work on a patch based on other WIP patches?

nikic added a comment.Mar 1 2021, 1:27 PM

Is there official documentation on how to submit/work on a patch based on other WIP patches?

You can add dependencies by clicking on "Edit Related Revisions" on the right. I've added the parent revision for you.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
478

You are performing two queries here, one in AA.isMustAlias above and another here. While caching mitigates some of the cost, it's better to only call AA.alias() and then check if the result is MustAlias/PartialAlias.

fvrmatteo updated this revision to Diff 327307.Mar 1 2021, 4:00 PM

Using a single AA.alias query.

fvrmatteo marked an inline comment as done.Mar 2 2021, 1:23 AM
fvrmatteo added inline comments.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
478

Thanks, I completely overlooked it!

fvrmatteo marked an inline comment as done.Mar 2 2021, 1:36 AM

I noticed @dfukalov planned to fix the DSE after their patch, so I'm not sure if I should delete mine, as their fixes/improvements may be covering more cases compared to my isolated test file.

I planned to work on DSE after GVN (eliminating redundant loads).
It seems these two fixes may be independent, but you can check my GVN approach in the diff: https://reviews.llvm.org/D93529?id=313082.
My test case for DSE fix is

define void @foo(float* %arg, i32 %i) {
bb:
  %i7 = add nuw nsw i32 %i, 1
  %i8 = zext i32 %i7 to i64
  %i9 = getelementptr inbounds float, float* %arg, i64 %i8
  store float undef, float* %i9, align 4

  %i2 = zext i32 %i to i64
  %i3 = getelementptr inbounds float, float* %arg, i64 %i2
  %i4 = bitcast float* %i3 to <2 x float>*
  store <2 x float> <float undef, float undef>, <2 x float>* %i4, align 16

  ret void
}

and suggest to add it to the patch since it uses vector types instead of arrays.

fhahn added a comment.Mar 2 2021, 9:38 AM

Using scalar evolution for this purpose is definitely inappropriate. I would suggest to try basing this on D93529 instead.

While using scalar evolution for the problem at hand is inappropriate, it might be worth a try to use SCEV to add 'loop-aware' DSE to catch cases like https://bugs.llvm.org/show_bug.cgi?id=46460. Just some food for thought in case anyone is interested in tackling this issue :)

llvm/test/Transforms/DeadStoreElimination/MSSA/offsetted-overlapping-stores.ll
6

It would probably worth to add a few additional cases with different offsets & store types, including cases where the do not fully overlap.

nikic added inline comments.Mar 2 2021, 2:17 PM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
467

Why not continue using P1 and P2 here? Of course AA will also strip pointer casts, but if we already did it, we may as well pass the stripped pointers.

471

As you are performing the alias query unconditionally now, the separate P1 == P2 shortcut here isn't useful anymore, you can drop it. (I also checked whether your change may have a compile-time impact due to the unconditional query, but it does not seem so.)

1314

Is BatchAA safe to use in the non-MSSA DSE?

It seems these two fixes may be independent, but you can check my GVN approach in the diff: https://reviews.llvm.org/D93529?id=313082.

I'm reading the GVN source code to learn about it and understand your approach. I'll try to support the dead store example you shared. I'm probably missing something, but it seems the GVN approach is based on information deriving from the Memory Dependence Analysis (and related API), while here I'd like to find a solution suitable for both MD and MSSA in the isOverwrite function, hence I'll try to generalise the check to support the vector types.

Just some food for thought in case anyone is interested in tackling this issue :)

Assuming I'll be able to come up with a proper solution to this issue, that surely would be another occasion to learn something more about LLVM.

I'll update the patch as soon as I can figure out a reasonable way to support the different types and will update the test files accordingly.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
467

I agree I should avoid doing the same action multiple times, although I used Later and Earlier because BatchAAResults is proving an alias() function expecting MemoryLocation arguments.

471

Indeed, thanks for pointing it out. Glad that for now the compile-time impact is negligible.

1314

I'm not sure it's safe because on the MD-based DSE I see that the IR is modified and the BatchAAResults should be used when there are no IR changes in between queries. At the same time plain AliasAnalysis is not providing the offset information, so I guess I should be dropping BatchAA here for now, even though the isOverwrite will try to call getClobberOffset on the AA reference.

fvrmatteo updated this revision to Diff 327784.Mar 3 2021, 7:14 AM

The new diff just adds some additional tests, one showing that the DSE vector test proposed by @dfukalov is properly handled and two showing that non-fully overlapping stores are kept untouched. The unneeded short-circuit condition is also removed.

@fhahn Do you think it's worth to add other specific tests not involving plain integers, vectors and arrays? Do the non-overlapping tests show what you expected?

nikic added inline comments.Mar 3 2021, 11:43 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1314

Some types of IR modifications (removing instructions if no new ones are added) can be okay, and the MSSA-based implementation uses that. I'm just not familiar with the old one. Maybe we can avoid the question altogether though, by just deleting this code? D97877

nikic added inline comments.Mar 7 2021, 9:38 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
467

Oh right, that makes sense. You might want to move the P1/P2 declarations lower in the code, where they are needed.

480

nit: Unnecessary parentheses.

1314

This code is gone now, can you please rebase?

llvm/test/Transforms/DeadStoreElimination/MSSA/offsetted-overlapping-stores.ll
48

Please store something other than undef here and below (0.0 is fine).

nikic added a comment.Mar 7 2021, 9:39 AM

Could you please also update the patch summary, as this is no longer using SCEV?

fvrmatteo updated this revision to Diff 328958.Mar 8 2021, 2:18 AM
fvrmatteo edited the summary of this revision. (Show Details)

Updated patch to reflect the requested changes. Updated the revision description too.

fvrmatteo marked 7 inline comments as done.Mar 8 2021, 2:23 AM

Could you please also update the patch summary, as this is no longer using SCEV?

Updated the patch summary, converted undef to 0.0, rebased to remove the MemDep DSE changes, removed the unnecessary parentheses and moved P1/P2 right before the first usage.

nikic accepted this revision.Mar 8 2021, 9:07 AM

LGTM

This revision is now accepted and ready to land.Mar 8 2021, 9:07 AM

Thanks for guiding me on this patch. I can't commit the patch myself as I don't have commit access.