This transform looks for suitable vector transfers from global memory to shared memory and converts them to async device copies.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| mlir/include/mlir/Dialect/NVGPU/TransformOps/NVGPUTransformOps.td | ||
|---|---|---|
| 32 | at the what? | |
| 34–35 | Why not call it bypass_l1 then? | |
| 39 | It feels like a footgun: the target may not itself be erased, but nested ops definitely are and we are not invalidating handle to those. I'd rather make this consume the target and produce a new handle for chaining. | |
| mlir/lib/Dialect/NVGPU/Transforms/CreateAsyncGroups.cpp | ||
| 26–30 | Nit: can this be extracted into a template and made common with the similar code below? | |
| 53 | Nit: could this elaborate what happens with 2d masks right now? | |
| 60 | Nit: "legal" is a bit too generic here. I'd rather say something like "currently supported by async copy". | |
| 96 | Nit: I believe walk can take a lambda returning void so we don't need to have WalkResult::advance() everywhere. | |
| 98 | Nit: no need to prefix with llvm::, here and below. | |
| 191 | For future: this should be really using the data layout mechanism, which would also fix the todo related to alignment above. | |
| 192–193 | I'd consider returning a (silenceable) failure if useMMASync is set and cannot be achieved. Or at least documenting this attribute as a hint rather than a guarantee. | |
at the what?