This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Relax libcall checks
ClosedPublic

Authored by nikic on Jul 25 2021, 8:40 AM.

Details

Summary

Rather than blocking the whole MemCpyOpt pass if the libcalls are not available, only disable creation of new memset/memcpy intrinsics where none existed previously. This only affects the store merging and load-store conversion optimization. Other optimizations are derived from existing intrinsics, which are well-defined in the absence of libcalls -- not having the libcalls just means that call simplification won't convert them to intrinsics.

This is a weaker variation of D104801, which dropped these checks entirely.

Diff Detail

Event Timeline

nikic created this revision.Jul 25 2021, 8:40 AM
nikic requested review of this revision.Jul 25 2021, 8:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2021, 8:40 AM
nikic updated this revision to Diff 361512.Jul 25 2021, 8:53 AM

Add test

tra accepted this revision.Jul 26 2021, 10:44 AM

LGTM in general.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
677

I'd add a TODO/FIXME. The only reason we need to do it is because of the assumption in some places (sanitizeers and some downstream back-ends) that we can't handle LLVM intrinsic w/o the matching libcall. Materializing LLVM intrinsics is not illegal and other passes do it.

803

Ditto.

llvm/test/Transforms/MemCpyOpt/no-libcalls.ll
3 ↗(On Diff #361512)

This is a bit ironic as AMDGPU back-end (as well as NVPTX) *can* actually handle memory intrinsics w/o the matching libcalls.

https://godbolt.org/z/Koedbzn5x

I think it may make sense to fold D106401 into this patch and add another test for AMDGPU + all optimizations enabled by the flag.

This revision is now accepted and ready to land.Jul 26 2021, 10:44 AM

Just my thoughts, I may not have all the info here..

I think materializing memcpy and memset should be standard and not be an issue in either sanitizers or downstream backends. There is a very small subset of intrinsics where I'd expect this, where passes introducing them are already in the default pipeline (e.g. MemCpyOpt, LoopIdiomRecognize).
I also understand the push back on the reverted patch, in that it was changing the status quo as far as expectations and a standardization should be done with proper RFC and llvm-dev discussions. I just think that this is something to consider changing in the long run.

This patch and the flag in D106401 makes sense to resolve the issues for the NVPTX use case (and AMDGPU as well I assume).

llvm/test/Transforms/MemCpyOpt/no-libcalls.ll
3 ↗(On Diff #361512)

Either folding into this or as a separate patch is reasonable IMO.
@tra: do you have another test that would be relevant to add in D106401 for nvptx (beside an additional RUN line in this one with the flag I mean)?

tra added inline comments.Jul 26 2021, 3:34 PM
llvm/test/Transforms/MemCpyOpt/no-libcalls.ll
3 ↗(On Diff #361512)

If the flag is folded into this patch, we should have a test for it (see my comment above). I'll add it to D106401 if it's not folded in.

tra added a comment.Aug 3 2021, 3:41 PM

@nikic So, what is your plan for this patch? I'm fine with it whether it's combined with D106401 or not.

This revision was landed with ongoing or failed builds.Aug 4 2021, 12:18 PM
This revision was automatically updated to reflect the committed changes.
nikic marked 2 inline comments as done.