The internal is missing, so this work adds that.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The motivation behind adding internal is to enable demotion of global shared memory variable in the kernel (maybe there are other ways to do it)
Here is the input MLIR code:
memref.global "private" @gv1 : memref<1xi64,3> func.func @get_gv0_memref() { %0 = memref.get_global @gv1 : memref<1xi64,3> %c0 = arith.constant 0 : index %c4 = arith.constant 4 : i64 memref.store %c4, %0[%c0] : memref<1xi64,3> return }
When generating PTX code for memref.global "private", we get:
.shared .align 8 .b8 gv1[8]; .visible .func get_gv0_memref() { .reg .b64 %rd<2>; mov.u64 %rd1, 4; st.shared.u64 [gv1], %rd1; ret; }
In the case where the linkage was memref.global "internal", the PTX code would look like this:
.visible .func get_gv0_memref() { .reg .b64 %rd<2>; // demoted variable .shared .align 8 .b8 gv1[8]; mov.u64 %rd1, 4; st.shared.u64 [gv1], %rd1; ret; }
These examples demonstrate the internal, specifically the demotion of the variable gv1 to shared memory in the PTX code.
LLVM linkage and MLIR symbol visibility are orthogonal concepts and should not be confused. For example, MLIR func.func declarations always have a private visibility, while their corresponding LLVM IR linkage can be external.
mlir/include/mlir/IR/SymbolTable.h | ||
---|---|---|
88 | I do not understand this comment, could you expand and explicitly state the difference between internal and private? |
Thanks for the feedback. As I mentioned, I am not sure this is the right way. Any ideas are welcome!
My goal is to create
memref.global "internal" @gv1 : memref<1xi64,3>
And lower into:
llvm.mlir.global internal @gv1() {addr_space = 3 : i32} : !llvm.array<1 x i64>
Currently, memref.global only supports : private, public, nested
I'm not really familiar with PTX, but do you know why it treats internal and private differently? Their semantic difference is minimal and shouldn't be relevant for PTX, I assume.
Is this just an oversight in the PTX lowering, or are you aware of any semantic difference there?
My goal is to create
memref.global "internal" @gv1 : memref<1xi64,3>And lower into:
llvm.mlir.global internal @gv1() {addr_space = 3 : i32} : !llvm.array<1 x i64>
One could also add a flag to the MemRefToLLVM conversion that ensures that private visibility is converted into internal linkage. Maybe it even makes sense to always emit internal linkage for these memrefs.
Currently, memref.global only supports : private, public, nested
Again, this is symbol visibility, which is an MLIR internal construct and really only specifies where this symbol is visible inside the IR.
Thanks for feedback. See the linkage types (internal vs private) in llvm https://llvm.org/docs/LangRef.html#linkage-types
I could get rid of Internal in MLIR symbols if it is not desired. In this case, I will need to have code like below in MemRefToLLVM.cpp. I am not sure which way is better.
LLVM::Linkage linkage; if (global.isPublic()) { linkage = LLVM::Linkage::External; } else { if (type.getMemorySpaceAsInt() == 3) { linkage = LLVM::Linkage::Internal; } else { linkage = LLVM::Linkage::Private; } }
The only difference is the visibility of a symbol in an ELF file, how does this impact PTX code generations behaviour? I do not assume that PTX cares about such things.
Especially considering that private doesn't expose a symbol in the symbol table, so why can this not be treated as internal global can.
I could get rid of Internal in MLIR symbols if it is not desired. In this case, I will need to have code like below in MemRefToLLVM.cpp. I am not sure which way is better.
LLVM::Linkage linkage; if (global.isPublic()) { linkage = LLVM::Linkage::External; } else { if (type.getMemorySpaceAsInt() == 3) { linkage = LLVM::Linkage::Internal; } else { linkage = LLVM::Linkage::Private; } }
I'm not a fan of encoding target specific behaviour into general purpose passes. Address spaces are fully target specific, so treating 3 somehow differently is not what we want.
For almost all intends and purposes, private and internal behave the same way, so we might really just lower private visibility to internal linkage, if PTX cannot deal with it. Ideally, this is done only when a certain pass option is set, to ensure that we do not mess up other existing flows.
Alternatively, the linkage itself could be added to memref.global, but that would leak LLVM internals up into higher level dialects that shouldn't really care about things like linkage.
Is it possible to fix this in the PTX backend then? A limitation in this backend should not be the reason to extend the symbol visibility that potentially affects all of MLIR.
I've sent a patch for review to fix the NVPTX backend limitation https://reviews.llvm.org/D154507
I do not understand this comment, could you expand and explicitly state the difference between internal and private?