This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Check uses of found Clobber in writtenBetween.
ClosedPublic

Authored by fhahn on Feb 16 2022, 3:39 AM.

Details

Summary

Currently writtenBetween can miss clobbers of Loc between End and Clobber
if Loc is not clobbered by End itself. End's defining accesses only have
to contain defs/phis that may clobber memory clobbered by End. But
writtenBetween needs to check for all write-clobbers of Loc.

To guarantee we see all write clobbers of Loc between the found Clobber
and End, I think we have to look at all uses of Clobber between Clobber
and End.

This fixes 2 mis-compiles illustrated in
llvm/test/Transforms/MemCpyOpt/memcpy-byval-forwarding-clobbers.ll

The implementation in this patch may consider write-clobbers on paths
that do not reach End as writes between Clobber and End. There's
potential to improve this as follow-up.

Diff Detail

Event Timeline

fhahn created this revision.Feb 16 2022, 3:39 AM
fhahn requested review of this revision.Feb 16 2022, 3:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2022, 3:39 AM
nikic added a comment.Feb 16 2022, 5:32 AM

I'm not sure this is the right approach. The getDefininingAccess() here is essentially only supposed to skip over End itself during the clobber walk. As you correctly note, this is not correct if End is a MemoryUse. But shouldn't we be able to still do basically the same thing, but with the previous def in the block? Something like stepping one back on getReverseDefsIterator()?

fhahn updated this revision to Diff 409338.Feb 16 2022, 11:03 AM

I'm not sure this is the right approach. The getDefininingAccess() here is essentially only supposed to skip over End itself during the clobber walk. As you correctly note, this is not correct if End is a MemoryUse. But shouldn't we be able to still do basically the same thing, but with the previous def in the block? Something like stepping one back on getReverseDefsIterator()?

I wasn't aware of the available reverse iterators for defs/accesses. I updated the code to recursively walk over the previous accesses. Because the iterator is block-only, we'd need to also go through the predecessors manually I think which makes it a bit trickier, unless there's already some existing functionaltiy I missed

I think we also need to do that for MemoryDefs, if End doesn't clobber Loc, because then the defining access could be optimized by skipping any accesses to Loc unless I am missing something.

fhahn updated this revision to Diff 409964.Feb 18 2022, 9:49 AM

Update this patch to just scan if start & end are in the same block, otherwise conservatively return true. It might be easier to fix the mis-compile first and subsequently re-add cross-bb scanning.

nikic added a comment.Feb 19 2022, 1:47 AM

I think we also need to do that for MemoryDefs, if End doesn't clobber Loc, because then the defining access could be optimized by skipping any accesses to Loc unless I am missing something.

Unlike MemoryUses, MemoryDefs have a separate defining access and optimized access. The defining access always points to the dominating MemoryDef/MemoryPhi, regardless of whether it clobbers any locations of the access. The optimized access does take clobbers into account.

If you are looking for a quick fix here, then the way to do it is to use the code as-is for MemoryDefs and only limit to the single-BB walk for MemoryUses. That should fix the issue with readonly byval calls, while keeping the much more important memcpy-memcpy dependence optimization working as before.

fhahn updated this revision to Diff 410296.Feb 21 2022, 7:04 AM

I think we also need to do that for MemoryDefs, if End doesn't clobber Loc, because then the defining access could be optimized by skipping any accesses to Loc unless I am missing something.

Unlike MemoryUses, MemoryDefs have a separate defining access and optimized access. The defining access always points to the dominating MemoryDef/MemoryPhi, regardless of whether it clobbers any locations of the access. The optimized access does take clobbers into account.

Ah right, the walker always only uses the optimized accesses for MemoryUses!

If you are looking for a quick fix here, then the way to do it is to use the code as-is for MemoryDefs and only limit to the single-BB walk for MemoryUses. That should fix the issue with readonly byval calls, while keeping the much more important memcpy-memcpy dependence optimization working as before.

Done in the latest version, thanks!

nikic accepted this revision.Feb 21 2022, 7:21 AM

LGTM

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
376

I'd move the getClobberingMemoryAccess() call into the !isa<MemoryUse> branch. Not much point doing it if we're going to discard it.

This revision is now accepted and ready to land.Feb 21 2022, 7:21 AM
This revision was landed with ongoing or failed builds.Feb 21 2022, 8:55 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked an inline comment as done.Feb 21 2022, 8:55 AM
fhahn added inline comments.
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
376

I reordered the checks in the committed version, thanks!