This is an archive of the discontinued LLVM Phabricator instance.

[WIP] Supporting C++ based kernel languages on AMDGPU Target
AbandonedPublic

Authored by yaxunl on Dec 9 2016, 11:54 AM.
Tokens
"Like" token, awarded by whchung.

Details

Summary

In C++ based languages targeting GPUGPU, e.g. HCC, OpenMP or Cuda, it is desirable to keep the AST free of explicit address space qualifier for simplified parsing and type checking. We can assume that all the pointers are generic pointers which can pointing to any real address spaces in the target IR. This results in a consistent and clean AST and therefore IR since every pointer is a generic pointer.

In Clang codegen, when we need to emit global variables or allocas in real address space, we cast the resulting pointer to a generic pointer and put them in the address map. Later on, when we need to refer to the address, we always get a generic pointer. In this way, we can keep the translation of AST consistent.

Currently Clang assumes that the address space for a generic pointer (a pointer which can pointing to any real memory region) is 0, which is correct for x86 and nvptx target, but is not correct for some other targets, e.g., amdgcn target, for which the generic address space is 4.

This patch introduce a virtual function TargetInfo::getDefaultTargetAddressSpace which allows each target to specify the default target address space based on language options. It also introduces ASTContext::getTargetGlobalAddressSpace and ASTContext::getTargetConstantAddressSpace as an abstraction for a unified approach to support different languages on address space aware targets, e.g. amdgcn and nvptx.

It also fixes various places in codegen for incorrect assumptions about default address space.

With this change, Clang will be able to generate correct IR for C++ based kernel languages e.g. HCC, OpenMP and Cuda for a greater range of targets.

It will not change the emitted IR for targets using 0 as generic address space. For these targets, since alloca returns generic pointer, they usually require some LLVM passes to cast alloca to real address space. This patch can help those targets generate correct alloca in Clang codegen.

This patch was co-authored by Yaxun (Sam) Liu and Wen-Heng (Jack) Chung.

Diff Detail

Event Timeline

yaxunl updated this revision to Diff 80923.Dec 9 2016, 11:54 AM
yaxunl retitled this revision from to Allow target to specify default address space for codegen.
yaxunl updated this object.
yaxunl added a subscriber: cfe-commits.
yaxunl updated this object.Dec 9 2016, 11:56 AM
rjmccall edited edge metadata.Dec 9 2016, 1:08 PM

What address space should local variables be in?

yaxunl added a comment.Dec 9 2016, 1:10 PM

What address space should local variables be in?

should always be 0 since that's llvm convention.

Ok, I'm not understanding this, then.

I declare a local variable:

int x;

According to you, x should be in the generic address space, i.e. the private address space. And at the LLVM IR level, this will be implemented with an AllocaInst, which always creates memory in the private address space, i.e. LLVM address space 0.

But Sema believes that the l-value 'x' has type int, without an address space, and therefore that '&x' has type int*, again without an explicit address space. And you want to say that that type actually points into a different address space, and to make sure that that pointer is in LLVM address space 4.

Something about this does not work.

I suspect that what you actually want to do in order to implement HCC is change SemaType so that 'int*' is implicitly interpreted as 'int __something *'. But I don't know how not having explicit address spaces actually works in the HCC language design, given that it's based on C++.

Because, notably, if you do that, then an attempt to pass &x as an int* will fail, which means this isn't really C++ anymore... and yet that appears to be exactly what you want.

Because, notably, if you do that, then an attempt to pass &x as an int* will fail, which means this isn't really C++ anymore... and yet that appears to be exactly what you want.

How about when generating alloca instruction, I insert addrspacecast to the default address space, then everything is in default address space.

Because, notably, if you do that, then an attempt to pass &x as an int* will fail, which means this isn't really C++ anymore... and yet that appears to be exactly what you want.

How about when generating alloca instruction, I insert addrspacecast to the default address space, then everything is in default address space.

My question is really about your language design. You have multiple address spaces in the implementation, but you're (apparently?) pretending that they're all part of the same address space in the source language. How is that expected to work? Does the default address space embed all other address spaces?

Because, notably, if you do that, then an attempt to pass &x as an int* will fail, which means this isn't really C++ anymore... and yet that appears to be exactly what you want.

How about when generating alloca instruction, I insert addrspacecast to the default address space, then everything is in default address space.

My question is really about your language design. You have multiple address spaces in the implementation, but you're (apparently?) pretending that they're all part of the same address space in the source language. How is that expected to work? Does the default address space embed all other address spaces?

Yes the default address space embeds all other address spaces.

yaxunl updated this revision to Diff 81309.Dec 13 2016, 2:51 PM

Cast alloca to default address space.

Because, notably, if you do that, then an attempt to pass &x as an int* will fail, which means this isn't really C++ anymore... and yet that appears to be exactly what you want.

How about when generating alloca instruction, I insert addrspacecast to the default address space, then everything is in default address space.

My question is really about your language design. You have multiple address spaces in the implementation, but you're (apparently?) pretending that they're all part of the same address space in the source language. How is that expected to work? Does the default address space embed all other address spaces?

Yes the default address space embeds all other address spaces.

Since the default address space (generic address space 4) can represent all other address spaces, we want to use it uniformly. For any pointer which is not in the default address space, e.g. alloca, we want to cast it to default address space and then use it. Therefore, in the address space agnostic languages, we can assume everything is in the default address space. Our backend is able to handle load/store in the default address space.

yaxunl updated this revision to Diff 89834.Feb 26 2017, 8:36 PM
yaxunl retitled this revision from Allow target to specify default address space for codegen to [WIP] Specify default address space for C++ on AMDGPU Target.

Synch up to trunk.
Change bitcast to pointer cast in varous places to accommodation addr space difference.

yaxunl updated this revision to Diff 89899.Feb 27 2017, 10:15 AM
yaxunl retitled this revision from [WIP] Specify default address space for C++ on AMDGPU Target to [WIP] Supporting C++ based kernel languages on AMDGPU Target.
yaxunl edited the summary of this revision. (Show Details)

Fix invalid bitcast for global variables.
Add run line for amdgcn target for a Cuda lit test.

jlebar added a subscriber: jlebar.Feb 27 2017, 10:38 AM
yaxunl updated this revision to Diff 89944.Feb 27 2017, 3:11 PM

Fixed invalid bitcasts due to vtable.
Added run line for amdgcn target to more Cuda lit tests.

yaxunl updated this revision to Diff 90243.Mar 1 2017, 3:19 PM
yaxunl edited the summary of this revision. (Show Details)

Once you rebase, can you also update AMDGPUTargetInfo::getDWARFAddressSpace to use AddrSpaceKind?

yaxunl abandoned this revision.Mar 21 2017, 8:31 AM

We consider to switch to use 0 as generic address space and abandon this since 0 as generic address space is more agreeable to both LLVM and Clang's convention.

Okay. That does seem more sensible than trying to change everything around the other way.