This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] AMDGPUAAResult::pointsToConstantMemory should not use the default MaxLookup (i.e., 6) to limit getUnderlyingObject
AbandonedPublic

Authored by jizhuoran on Aug 28 2020, 9:33 PM.

Details

Summary

The default value of MaxLookup is 6, which limits the number of instructions to be stripped off when getting the underlying object. 'Loop Unroll' and 'Loop Strength Reduction' passes trend to replace the memory index 'base + i * offset' to 'base, base += offset, base += offset'. It increases the depth of the underlying object so that pointsToConstantMemory may fail to identify a pointer's underlying object is a NoAlias and ReadOnly Argument. It leads to false memory load scheduling dependency and prevents the instruction scheduler to pipeline the memory load operations.

The default MaxLookup is too small for this case. In general, even the first memory load has a GEP and a bitcast. The false memory dependency begins from the fifth memory load for a global memory (argument). It causes less efficient assembly codes.a

Specifying a larger MaxLookup value or even an unlimited MaxLookup (i.e., 0) solves this problem.

Diff Detail

Event Timeline

jizhuoran created this revision.Aug 28 2020, 9:33 PM
jizhuoran requested review of this revision.Aug 28 2020, 9:33 PM
jizhuoran abandoned this revision.Aug 28 2020, 9:51 PM