This is an archive of the discontinued LLVM Phabricator instance.

[clang][amdgpu] - Choose when to promote VarDecl to address space 4.
ClosedPublic

Authored by estewart08 on Dec 13 2021, 12:08 PM.

Details

Summary

There are instances where clang codegen creates stores to
address space 4 in ctors, which causes a crash in llc.
This store was being optimized out at opt levels > 0.

For example:

pragma omp declare target
static const double log_smallx = log2(smallx);
pragma omp end declare target

This patch ensures that any global const that does not
have constant initialization stays in address space 1.

Note - a second patch is in the works where all global
constants are placed in address space 1 during
codegen and then the opt pass InferAdressSpaces
will promote to address space 4 where necessary.

Diff Detail

Event Timeline

estewart08 created this revision.Dec 13 2021, 12:08 PM
estewart08 requested review of this revision.Dec 13 2021, 12:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 13 2021, 12:08 PM

Resubmit patch with lint.

There's no real advantage to using addrspace(4) at all

There's no real advantage to using addrspace(4) at all

There are a few places where it's used as an optimization hint / as a crutch where we don't have a proper analysis. Fundamentally I would like to eliminate addrspace(4) eventually

Patch looks ok to me. This will fix the miscompile (we end up with a store to addrspace(4) at present) without upsetting whatever hacks rely on addrspace(4). @arsenm reasonable as a point fix?

Patch looks ok to me. This will fix the miscompile (we end up with a store to addrspace(4) at present) without upsetting whatever hacks rely on addrspace(4). @arsenm reasonable as a point fix?

Yes, I'm mostly saying to not waste your time trying to infer addrspace(4) in InferAddressSpaces or anything like that

JonChesterfield accepted this revision.Dec 13 2021, 1:59 PM

That sounds good here. Infer addrspaces is pretty complicated, given marginal benefit for (4) this patch seems to hit the mark.

This revision is now accepted and ready to land.Dec 13 2021, 1:59 PM
yaxunl requested changes to this revision.Dec 13 2021, 2:15 PM

This may cause perf regressions for HIP.

clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp
10

Do you know why this is not treated as constant initialization?

This revision now requires changes to proceed.Dec 13 2021, 2:15 PM
This revision was not accepted when it landed; it landed in state Needs Revision.Dec 13 2021, 2:30 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.

This may cause perf regressions for HIP.

Do you have a test that would show such a regression? Emitting a store to address space (4) in a constructor seems the wrong thing to do.

clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp
10

There is no initialization here:

const int A::Foo;

It seems the compiler ignores the 123 in the struct.

I see address space (4) when the test is written like this:

struct A {
  static const int Foo;
};
const int A::Foo = 123;

This may cause perf regressions for HIP.

Do you have a test that would show such a regression? Emitting a store to address space (4) in a constructor seems the wrong thing to do.

The two lit tests which changed from addr space 4 to 1 demonstrated that. In alias analysis, if a variable is in addr space 4, the backend knows that it is constant and can do optimizations on it. After changing to addr space 1, those optimizations are gone.

I am not objecting to putting variables like const float x=log2(y) to addr space 1. However, the lit tests indicate the condition check is too restrictive.

clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp
10

In this case, there are two Decl's for A::Foo and the initializer of A::Foo is on the first initializer whereas hasConstantInitialization only checks the final Decl, which does not have an initializer. Therefore hasConstantInitialization may conservatively return false for a variable with a constant initializer in a previous decl.

clang codegen calls VarDecl::getAnyInitializer to checks all Decls of a variable when emitting its initializer in LLVM IR

https://github.com/llvm/llvm-project/blob/5336befe8c3cde08cec020583700b4d2ba25ac16/clang/lib/AST/Decl.cpp#L2277

I would suggest calling getAnyInitializer to get the initializer, then check whether it is constant expr to determine whether the variable will have a constant initializer in IR. I think hasConstantInitialization does not do that because it has multiple usages, not just for clang codegen, therefore it does not check all decls.

This may cause perf regressions for HIP.

Do you have a test that would show such a regression? Emitting a store to address space (4) in a constructor seems the wrong thing to do.

The two lit tests which changed from addr space 4 to 1 demonstrated that. In alias analysis, if a variable is in addr space 4, the backend knows that it is constant and can do optimizations on it. After changing to addr space 1, those optimizations are gone.

The backend also knows because the constant flag is set on the global variable. Addrspace(4) is a kludge which is largely redundant with other mechanisms for indicating constants

Changing the has-constant-initialiser check to a more precise/robust one sounds reasonable to me.

The backend also knows because the constant flag is set on the global variable. Addrspace(4) is a kludge which is largely redundant with other mechanisms for indicating constants

There's an interesting edge case here - we might want variables that are assigned to from a global ctor and subsequently constant, in which case they are not 'constant' as far as IR is concerned, but they are amenable to scalar loads and will be invariant once loaded. Some tables of data that are computed once in a .ctor kernel launch and then read by other kernels. I'm not totally sure how to express that pattern in IR though.

This may cause perf regressions for HIP.

Do you have a test that would show such a regression? Emitting a store to address space (4) in a constructor seems the wrong thing to do.

The two lit tests which changed from addr space 4 to 1 demonstrated that. In alias analysis, if a variable is in addr space 4, the backend knows that it is constant and can do optimizations on it. After changing to addr space 1, those optimizations are gone.

The backend also knows because the constant flag is set on the global variable. Addrspace(4) is a kludge which is largely redundant with other mechanisms for indicating constants

If I am understanding you correctly, putting things in address space (4) has little to no performance benefit. @yaxunl seems to think otherwise. I agree that we can further constrain the address space (1) criteria, but I am getting conflicting viewpoints here on performance.

This may cause perf regressions for HIP.

Do you have a test that would show such a regression? Emitting a store to address space (4) in a constructor seems the wrong thing to do.

The two lit tests which changed from addr space 4 to 1 demonstrated that. In alias analysis, if a variable is in addr space 4, the backend knows that it is constant and can do optimizations on it. After changing to addr space 1, those optimizations are gone.

The backend also knows because the constant flag is set on the global variable. Addrspace(4) is a kludge which is largely redundant with other mechanisms for indicating constants

If backend can only rely on constant flag then we do not need put global variables in constant addr space.

Let's leave this patch as it is now. And revisit it if there are any regressions found.

This may cause perf regressions for HIP.

Do you have a test that would show such a regression? Emitting a store to address space (4) in a constructor seems the wrong thing to do.

The two lit tests which changed from addr space 4 to 1 demonstrated that. In alias analysis, if a variable is in addr space 4, the backend knows that it is constant and can do optimizations on it. After changing to addr space 1, those optimizations are gone.

The backend also knows because the constant flag is set on the global variable. Addrspace(4) is a kludge which is largely redundant with other mechanisms for indicating constants

If backend can only rely on constant flag then we do not need put global variables in constant addr space.

Let's leave this patch as it is now. And revisit it if there are any regressions found.

What about situations of a derived pointer to the global variable? For example

const int a[100] ;

foo(&a[50]);

If we put a in addr space 4, it is easy to deduce &a[50] is constant. If we put a in addr space 1, how does backend know &a[50] is constant?

What about situations of a derived pointer to the global variable? For example

const int a[100] ;

foo(&a[50]);

If we put a in addr space 4, it is easy to deduce &a[50] is constant. If we put a in addr space 1, how does backend know &a[50] is constant?

Everything using alias analysis looks back for the definition of the original object