This transform looks for suitable vector transfers from global memory to shared memory and converts them to async device copies.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/include/mlir/Dialect/NVGPU/TransformOps/NVGPUTransformOps.td | ||
---|---|---|
32 | at the what? | |
34–35 | Why not call it bypass_l1 then? | |
39 | It feels like a footgun: the target may not itself be erased, but nested ops definitely are and we are not invalidating handle to those. I'd rather make this consume the target and produce a new handle for chaining. | |
mlir/lib/Dialect/NVGPU/Transforms/CreateAsyncGroups.cpp | ||
25–29 | Nit: can this be extracted into a template and made common with the similar code below? | |
52 | Nit: could this elaborate what happens with 2d masks right now? | |
59 | Nit: "legal" is a bit too generic here. I'd rather say something like "currently supported by async copy". | |
95 | Nit: I believe walk can take a lambda returning void so we don't need to have WalkResult::advance() everywhere. | |
97 | Nit: no need to prefix with llvm::, here and below. | |
190 | For future: this should be really using the data layout mechanism, which would also fix the todo related to alignment above. | |
191–192 | I'd consider returning a (silenceable) failure if useMMASync is set and cannot be achieved. Or at least documenting this attribute as a hint rather than a guarantee. |
at the what?