Page MenuHomePhabricator

Add ability to detect no-alias between different fields of the same structure
Needs RevisionPublic

Authored by pawosm01 on Sep 28 2021, 10:53 AM.

Details

Summary

BasicAliasAnalysis tends to give up on proving no-aliasing between
two memory accesses when they share the same underlying object (e.g
when they are fields of the same structure). Such simplified approach
results in losing optimization opportunities that competing compilers
are able to exploit. Namely, when a loop invariant is loaded from the
i-th element of some array being a field of some global structure and
on each iteration i-th element of another array being a different
field of the same global structure is written to, the load is not
hoisted resulting in tragic performance at -O3 comparing to what
other compiler can achieve at the same optimization level.

This patch adds some additional logic that checks just for that:
it returns NoAlias when accessed elements of the same underlying
object are two different fields of the same structure.

Some of the test cases had to be modified in global_alias.ll which
represented the situation addressed by this patch.

This patch may seem somewhat constrained as it was focused on
improving performance of the particular loop I was working with. Doing
it any more general results in a certain number of failing test cases.

Diff Detail

Unit TestsFailed

TimeTest
180 msx64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-cxa-atexit.S
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -m64 -c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/orc/X86_64LinuxConfig/TestCases/Linux/x86-64/Output/trivial-cxa-atexit.S.tmp /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/orc/TestCases/Linux/x86-64/trivial-cxa-atexit.S
180 msx64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-static-initializer.S
Script: -- : 'RUN: at line 7'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -m64 -c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/orc/X86_64LinuxConfig/TestCases/Linux/x86-64/Output/trivial-static-initializer.S.tmp /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/orc/TestCases/Linux/x86-64/trivial-static-initializer.S
180 msx64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-tls.S
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -m64 -c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/orc/X86_64LinuxConfig/TestCases/Linux/x86-64/Output/trivial-tls.S.tmp /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/orc/TestCases/Linux/x86-64/trivial-tls.S

Event Timeline

pawosm01 created this revision.Sep 28 2021, 10:53 AM
pawosm01 requested review of this revision.Sep 28 2021, 10:53 AM
pawosm01 edited the summary of this revision. (Show Details)Sep 28 2021, 10:56 AM
pawosm01 added subscribers: tstellar, jdoerfert.
nikic requested changes to this revision.Sep 28 2021, 11:06 AM
nikic added a subscriber: nikic.

BasicAA doesn't do this because it's wrong. It is legal to overindex an array in a structure, as long as the bounds of the underlying object are not exceeded, even for inbounds GEP.

This revision now requires changes to proceed.Sep 28 2021, 11:06 AM
lebedev.ri requested changes to this revision.Sep 28 2021, 11:23 AM
lebedev.ri added a subscriber: lebedev.ri.

This is valid given C++ semantics, but not LLVM IR semantics.
I have it my queue to encode the C++ semantics, hang on.

I do realize that this patch isn't the best solution for this problem, but I wonder what it would be. Maybe LICM (where the decision whether to hoist is made) should rely on something more than BasicAA, but which pass should do such analysis and basing on what criteria?

nikic added a comment.Sep 29 2021, 8:55 AM

I do realize that this patch isn't the best solution for this problem, but I wonder what it would be. Maybe LICM (where the decision whether to hoist is made) should rely on something more than BasicAA, but which pass should do such analysis and basing on what criteria?

The correct solution for this is implemented in D109746, but still needs to be combined with additional range information encoded by the frontend and/or stronger range analysis.