Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Adding some comgr folks in the interests of making sure this revision isn't subtly wrong
I'm potentially adding an extra field in gpu::TargetOptions for passing extra cmd arguments to tools, for example passing --max-reg-count to ptxas on NVIDIA. The question here is, what would be the AMD tool where I should pass those extra flags here?
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td | ||
---|---|---|
453–459 ↗ | (On Diff #545884) | I'm in the process of deleting the control libraries. Everything except wave64 currently has a clear path to deletion |
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td | ||
---|---|---|
453–459 ↗ | (On Diff #545884) | What would be the new mechanism? Also, any ideas on how to address compatibility with current ROCm installs? Should we ship this now and then update? |
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td | ||
---|---|---|
453–459 ↗ | (On Diff #545884) | For the most part 'work correctly' as if it were a normal library. I would just not bother adding configuration points for the optional ones and just use the conservative default setting. Something will likely still be required for specific subtarget handling (and wave64 vs. wave32 may end up as complete separate builds) |
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td | ||
---|---|---|
453–459 ↗ | (On Diff #545884) | Ok, but in the new mechanism without amdgcn/bitcode control libraries, is there an option for requesting fast math and all of this other options? I mean this attribute only specifies that compilation should use fast math, for example: rocdl.target<flags = {fast, wave64}> Only says the compilation mechanism should use fast math. So we could leave those here? The actual implementation and using those control libraries is done line 205 in mlir/lib/Target/LLVM/ROCDL/Target.cpp |
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td | ||
---|---|---|
453–459 ↗ | (On Diff #545884) | Ideally the optimization of the libraries happens based on the fast math flags and attributes of the call sites. For example, the callers should mark their outgoing arguments with nofpclass(nan inf) for finite only. Attributor will then be able to propagate that into the library functions assuming all callers also use fast math For sqrt, the default will be correct. the backend will look for llvm.sqrt calls with appropriate !fpmath metadata for the not correctly rounded case. For general unsafe/approximate functions, it will be driven off some combination of afn or contract flags on the call. DAZ checks can also be folded out based on the known denormal mode by attributor |
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td | ||
---|---|---|
453–459 ↗ | (On Diff #545884) | Ok I see, what about existing ROCm installations, again should we ship this as it is, and change it in the future? I mean, I can just remove those control libs . Also, how about we keep these in the target attribute? as they could be used by a pass aware of the target that swaps non swap to fast math? |
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td | ||
---|---|---|
453–459 ↗ | (On Diff #545884) | Here the "target" attribute is basically the "driver" of the GPU kernel compiler. It defines "how to invoke LLVM and the assembler to get a binary back" |
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td | ||
---|---|---|
453–459 ↗ | (On Diff #545884) | Ok, just to be clear, you are removing the libraries but keeping the control variables? Like you plan to remove oclc_daz_opt.bc but keeping the respective control variable? Which libraries are going to be removed? Are oclc_isa_version_, oclc_abi_version_, oclc_wavefrontsize64 & ockl also going to be removed? Target attributes are the new mechanism in MLIR for handling compilation of GPU modules, however they also possess information on the target for example: #rocdl.target<chip="gfx90a", flags = {fast, noWave64, unsafe_math}> Says that the GPU module should be compiled for chip gfx90a, using fast, unsafe_math and using wave32 flags. |
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td | ||
---|---|---|
453–459 ↗ | (On Diff #545884) | There's a few moving parts here. The first set of optional variables (unsafe math, DAZ, sqrt) should be completely gone in short order. I have a few more pieces to implement but I think are fully deletable within 1-2 months. These conceptually do not belong to the target, so I don't think you want to handle them here. The control library for the ABI version will likely be gone soon, and replaced with clang directly emitting the global definition. The target (isa version and wave32/wave64) case is trickier. Long term I have my eye on deleting them, but it's not close enough to have a concrete idea of exactly what it looks like. |
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td | ||
---|---|---|
453–459 ↗ | (On Diff #545884) | Ok, so a target in here doesn't map 1-1 to an LLVM target, these options inside a target are closer to cmd line options that would be passed to clang. Ok, I'll change these to emit global definitions and leave linking to wave64. |
Adds a couple of syntax tests & fixes a bug not displaying the link option in the attribute.
So, this code looks reasonable to me (and, as a bonus, I can tell how to invoke this when I'm generating binaries as a library), but I don't want to hand out a green check without @arsenm or someone else who's more involved in the intricacies of how LLVM gets called for GPU compilations signing off, unless we can't get a hold of anyone.
We're not committing to a stable API here, it's a first version that we need to plumb together and learn from.
If we want to be conservative we can just remove every flag and add them later when @arsenm can provide more guidance, otherwise that seems fine to me to land and iterate in tree.
Are these meant to be called before main() like this?