This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] AMDGPUAAResult::pointsToConstantMemory should not use the default MaxLookup (i.e., 6) to limit getUnderlyingObject
Needs ReviewPublic

Authored by jizhuoran on Aug 28 2020, 10:22 PM.

Details

Summary

The default value of MaxLookup is 6, which limits the number of instructions to be stripped off when getting the underlying object. 'Loop Unroll' and 'Loop Strength Reduction' passes trend to replace the memory index 'base + i * offset' to 'base, base += offset, base += offset'. It increases the depth of the underlying object so that pointsToConstantMemory may fail to identify a pointer's underlying object is a NoAlias and ReadOnly Argument. It leads to false memory load scheduling dependency and prevents the instruction scheduler to pipeline the memory load operations.

The default MaxLookup is too small for this case. In general, even the first memory load has a GEP and a bitcast. The false memory dependency begins from the fifth memory load for a global memory (argument). It causes less efficient assembly codes.a

Specifying a larger MaxLookup value or even an unlimited MaxLookup (i.e., 0) solves this problem.

Diff Detail

Event Timeline

jizhuoran created this revision.Aug 28 2020, 10:22 PM
jizhuoran requested review of this revision.Aug 28 2020, 10:22 PM

Unlimited is really too aggressive. It can slow down compilation dramatically in some cases.
Also it would be nice to see a relevant test.

I would agree with Stas here.
In case you can identify the patterns that require the lookup deeper then 6 levels, you probably can formulate the exact threshold.
And adding tests for such a pattern would make it clear.

arsenm resigned from this revision.Sep 28 2022, 2:10 PM

Unbounding this completely is too aggressive, and needs a test case