This is an archive of the discontinued LLVM Phabricator instance.

[FunctionAttrs] Infer argmemonly .
ClosedPublic

Authored by fhahn on Mar 10 2022, 2:38 PM.

Details

Summary

This patch adds initial argmemonly inference, by checking the underlying
objects of locations returned by MemoryLocation.

I think this should cover most cases, except function calls to other
argmemonly functions.

I'm not sure if there's a reason why we don't infer those yet.

Additional argmemonly can improve codegen in some cases. It also makes
it easier to come up with a C reproducer for 7662d1687b09 (already fixed,
but I'm trying to see if C/C++ fuzzing could help to uncover similar
issues.)

Compile-time impact:
NewPM-O3: +0.01%
NewPM-ReleaseThinLTO: +0.03%
NewPM-ReleaseLTO+g: +0.05%

https://llvm-compile-time-tracker.com/compare.php?from=067c035012fc061ad6378458774ac2df117283c6&to=fe209d4aab5b593bd62d18c0876732ddcca1614d&stat=instructions

Diff Detail

Event Timeline

fhahn created this revision.Mar 10 2022, 2:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 2:38 PM
fhahn requested review of this revision.Mar 10 2022, 2:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 2:38 PM
nikic added a comment.Mar 10 2022, 2:49 PM

Just wondering whether integrating this into checkFunctionMemoryAccess() would make more sense? It already does all the ModRefBehavior queries, so I'd expect it to fit naturally.

FWIW, D50107. Though, I'm happily abandon that one.

Working backwards from each memory instruction seems potentially extremely expensive. We could clearly do a forward walk to identify an OK set, then return false if any memory access is in that set. Not sure the complexity is worth it though, have you run compile time measurements with this patch?

The other benefit to the forward walk is that you could process the entire SCC at once, and have speculative membership in that set.

fhahn updated this revision to Diff 414695.Mar 11 2022, 10:30 AM

Just wondering whether integrating this into checkFunctionMemoryAccess() would make more sense? It already does all the ModRefBehavior queries, so I'd expect it to fit naturally.

Sounds good, I first wanted to see if there's any concerns with adding argmemonly in general here. Updated the code to integrate the logic in checkFunctionMemoryAccess.

I also updated the logic to to allow accesses to allocas in argmemonly functions.

Working backwards from each memory instruction seems potentially extremely expensive. We could clearly do a forward walk to identify an OK set, then return false if any memory access is in that set. Not sure the complexity is worth it though, have you run compile time measurements with this patch?

The other benefit to the forward walk is that you could process the entire SCC at once, and have speculative membership in that set.

The compile-time impact was low for the original version:

NewPM-O3: +0.01%
NewPM-ReleaseThinLTO: +0.03%
NewPM-ReleaseLTO+g: +0.05%

I now updated the code to integrate the checks in the existing checkFunctionMemoryAccess and the impact is similar (+0.02%, +0.04%, +0.04%). The updated version handles a few additional cases than the original one though. https://llvm-compile-time-tracker.com/compare.php?from=1d4268224bedfcbfefbf2c902c9b6417114a688d&to=0192ddd08f36f6b6e8a847d65a4f03a75eca71e6&stat=instructions

I'm not sure if we have access to memoryssa at this point, but MSSA could probably be used to avoid iterating over all instructions in a function.

nikic added inline comments.Mar 11 2022, 12:57 PM
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
153

The way the code is structured, can't you put the fallback into an else at the end? It looks like all other branches update it.

246

To avoid enumerating all combinations, may it would be cleaner to construct ModRefInfo and then or it with either FMRL_ArgumentPointees or FMRL_Anywhere?

283

I think you'd want to ReadsMemory |= isRefSet(MR) etc below and then early exit on ReadMemory && WritesMemory && !ArgMemOnly to avoid some duplication here.

fhahn updated this revision to Diff 415198.Mar 14 2022, 1:19 PM

Fix build failure, restructured code as suggeted, thanks!

fhahn marked 3 inline comments as done.Mar 14 2022, 1:19 PM
fhahn added inline comments.
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
153

Done, thanks!

246

Yes that's better, updated!

283

Thanks, using isRefSet/isModSet works well!

nikic added inline comments.Mar 15 2022, 1:36 AM
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
145

getUnderlyingObject() never returns null.

192

Hm, should these checks be going below the pointsToConstantMemory() checks, for the same reason why we don't consider a load from constant memory as a "ref"?

As an aside, pointsToConstantMemory() already does a recursive underlying object walk to check whether the pointer is based on a constant global or alloca, and this adds an additional (weaker, non-recursive) walk. It would probably make sense to convert the underlying pointerToConstantMemory() API into something like classifyLocation() that can return one of Constant, Local, Argument or Unknown, so all of these can be handled in a uniform manner.

292

I think this check if redundant, as we already check the same thing inside the loop.

fhahn updated this revision to Diff 415444.Mar 15 2022, 8:09 AM
fhahn marked 4 inline comments as done.

Address latest comments.

fhahn marked 2 inline comments as done.Mar 15 2022, 8:11 AM
fhahn added inline comments.
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
192

I moved the check below AAR.pointsToConstantMemory( in all cases below.

It would probably make sense to convert the underlying pointerToConstantMemory() API into something like classifyLocation() that can return one of Constant, Local, Argument or Unknown, so all of these can be handled in a uniform manner.

Sounds good. Are you aware of any other places that would benefit from a more general version?

292

Thanks, I replaced it with an assertion.

fhahn marked 2 inline comments as done.Mar 15 2022, 8:11 AM
fhahn added inline comments.
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
145

Removed the extra check, thanks!

nikic accepted this revision.Mar 15 2022, 9:58 AM

LGTM

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
192

Nope, not aware of other places.

This revision is now accepted and ready to land.Mar 15 2022, 9:58 AM
fhahn updated this revision to Diff 415730.Mar 16 2022, 1:39 AM

Rebase and fix AMDGPU test. I'm planning this after the pre-commit tests come back.

This revision was landed with ongoing or failed builds.Mar 16 2022, 3:24 AM
This revision was automatically updated to reflect the committed changes.

Seeing a test-suite failure for benchmark SingleSource/Benchmarks/Linpack/linpack-pc.test with -flto. This is not being seen without LTO.
The failing output I'm seeing is as follows:

...Linux_LTO.220324/test-suite-build-2022-03-24_14-28-44/tools/fpcmp-target: Compared: 5.949578e+05 and 1.600000e+00
abs. diff = 5.949562e+05 rel.diff = 3.718476e+05
Out of tolerance: rel/abs: 1.000000e-04/0.000000e+00

Reverting this commit allows it to pass.

@fhahn let me know if you need more details to reproduce

fhahn added a comment.Mar 27 2022, 1:32 PM

Seeing a test-suite failure for benchmark SingleSource/Benchmarks/Linpack/linpack-pc.test with -flto. This is not being seen without LTO.
The failing output I'm seeing is as follows:

...Linux_LTO.220324/test-suite-build-2022-03-24_14-28-44/tools/fpcmp-target: Compared: 5.949578e+05 and 1.600000e+00
abs. diff = 5.949562e+05 rel.diff = 3.718476e+05
Out of tolerance: rel/abs: 1.000000e-04/0.000000e+00

Reverting this commit allows it to pass.

@fhahn let me know if you need more details to reproduce

Thanks for the heads-up! I am unable to reproduce this on X86 on my system. Could you share the exact configuration & flags which cause the failure?

Seeing a test-suite failure for benchmark SingleSource/Benchmarks/Linpack/linpack-pc.test with -flto. This is not being seen without LTO.
The failing output I'm seeing is as follows:

...Linux_LTO.220324/test-suite-build-2022-03-24_14-28-44/tools/fpcmp-target: Compared: 5.949578e+05 and 1.600000e+00
abs. diff = 5.949562e+05 rel.diff = 3.718476e+05
Out of tolerance: rel/abs: 1.000000e-04/0.000000e+00

Reverting this commit allows it to pass.

@fhahn let me know if you need more details to reproduce

Thanks for the heads-up! I am unable to reproduce this on X86 on my system. Could you share the exact configuration & flags which cause the failure?

Ah that's interesting that you couldn't reproduce it on x86. I did this test on a RHEL 8.4 Power system. Here was how I compiled and ran the test:

clang  -flto    -O3 -DNDEBUG   -w -Werror=date-time -ffp-contract=off -ffp-contract=off -DFMA_DISABLED=1 -ffp-contract=off -DFMA_DISABLED=1 -o linpack-pc.c.o   -c test-suite/SingleSource/Benchmarks/Linpack/linpack-pc.c
clang -flto    -O3 -DNDEBUG      linpack-pc.c.o  -o linpack-pc  -lm
.../Linux_LTO.220324/test-suite-build-2022-03-24_14-28-44/tools/timeit-target --limit-core 0 --limit-cpu 7200 --timeout 7200 --limit-file-size 104857600 --limit-rss-size 838860800 --append-exitstatus --redirect-output linpack-pc.test.out --redirect-input /dev/null --summary linpack-pc.test.time linpack-pc
.../Linux_LTO.220324/test-suite-build-2022-03-24_14-28-44/tools/fpcmp-target -r 0.0001 linpack-pc.test.out .../Linux_LTO.220324/test-suite-build-2022-03-24_14-28-44/SingleSource/Benchmarks/Linpack/linpack-pc.reference_output
fhahn added a comment.Mar 28 2022, 8:18 AM

Ah that's interesting that you couldn't reproduce it on x86. I did this test on a RHEL 8.4 Power system. Here was how I compiled and ran the test:

clang  -flto    -O3 -DNDEBUG   -w -Werror=date-time -ffp-contract=off -ffp-contract=off -DFMA_DISABLED=1 -ffp-contract=off -DFMA_DISABLED=1 -o linpack-pc.c.o   -c test-suite/SingleSource/Benchmarks/Linpack/linpack-pc.c
clang -flto    -O3 -DNDEBUG      linpack-pc.c.o  -o linpack-pc  -lm

I am unable to build for PowerPC. Could you share linpack-pc.c.o ?

Ah that's interesting that you couldn't reproduce it on x86. I did this test on a RHEL 8.4 Power system. Here was how I compiled and ran the test:

clang  -flto    -O3 -DNDEBUG   -w -Werror=date-time -ffp-contract=off -ffp-contract=off -DFMA_DISABLED=1 -ffp-contract=off -DFMA_DISABLED=1 -o linpack-pc.c.o   -c test-suite/SingleSource/Benchmarks/Linpack/linpack-pc.c
clang -flto    -O3 -DNDEBUG      linpack-pc.c.o  -o linpack-pc  -lm

I am unable to build for PowerPC. Could you share linpack-pc.c.o ?

Sure, here it is:

fhahn added a comment.Mar 28 2022, 8:33 AM

Ah that's interesting that you couldn't reproduce it on x86. I did this test on a RHEL 8.4 Power system. Here was how I compiled and ran the test:

clang  -flto    -O3 -DNDEBUG   -w -Werror=date-time -ffp-contract=off -ffp-contract=off -DFMA_DISABLED=1 -ffp-contract=off -DFMA_DISABLED=1 -o linpack-pc.c.o   -c test-suite/SingleSource/Benchmarks/Linpack/linpack-pc.c
clang -flto    -O3 -DNDEBUG      linpack-pc.c.o  -o linpack-pc  -lm

I am unable to build for PowerPC. Could you share linpack-pc.c.o ?

Sure, here it is:

Looks like the issue is already happening when preparing for LTO. Could you also share the original input IR, before optimizations (passing -emit-llvm -mllvm -disable-llvm-optzns to the first clang invocation)?

Ah that's interesting that you couldn't reproduce it on x86. I did this test on a RHEL 8.4 Power system. Here was how I compiled and ran the test:

clang  -flto    -O3 -DNDEBUG   -w -Werror=date-time -ffp-contract=off -ffp-contract=off -DFMA_DISABLED=1 -ffp-contract=off -DFMA_DISABLED=1 -o linpack-pc.c.o   -c test-suite/SingleSource/Benchmarks/Linpack/linpack-pc.c
clang -flto    -O3 -DNDEBUG      linpack-pc.c.o  -o linpack-pc  -lm

I am unable to build for PowerPC. Could you share linpack-pc.c.o ?

Sure, here it is:

Looks like the issue is already happening when preparing for LTO. Could you also share the original input IR, before optimizations (passing -emit-llvm -mllvm -disable-llvm-optzns to the first clang invocation)?

Sure, here is the LLVM IR generated:

fhahn added a comment.Apr 4 2022, 2:18 PM

snip

Looks like the issue is already happening when preparing for LTO. Could you also share the original input IR, before optimizations (passing -emit-llvm -mllvm -disable-llvm-optzns to the first clang invocation)?

Sure, here is the LLVM IR generated:

Thanks for sharing the IR. There are a couple of places where we now infer argmemonly, but they seem to be correctly marked as argmemonly, so I suspect some other optimization may be causing the issue. Do you happen to know which function is getting miscompiled?

akdandrea added a comment.EditedApr 8 2022, 11:09 AM

snip

Looks like the issue is already happening when preparing for LTO. Could you also share the original input IR, before optimizations (passing -emit-llvm -mllvm -disable-llvm-optzns to the first clang invocation)?

Sure, here is the LLVM IR generated:

Thanks for sharing the IR. There are a couple of places where we now infer argmemonly, but they seem to be correctly marked as argmemonly, so I suspect some other optimization may be causing the issue. Do you happen to know which function is getting miscompiled?

Just to close the loop on this, it appears that I can now get this to pass with LLVM ToT on Power. So, I don't think there is any further investigation required at this time. I appreciate your help on this, thanks again