Keeping in the affine.for to gpu.launch conversions, which should
probably be the affine.parallel to gpu.launch conversion as well.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Keeping in the affine.for to gpu.launch conversions, which should probably be the affine.parallel to gpu.launch conversion as well.
The maximum code-reuse flow would be affine.for->(affine dep analysis + parallelization) -> affine.parallel -> scf.parallel -> gpu.kernel, unless there is some specific information that can be expressed at both affine and gpu levels but cannot be expressed at scf level.
mlir/include/mlir/Conversion/Passes.td | ||
---|---|---|
183–185 | You could perhaps add a TODO here or somewhere else to the effect: "Consider removing this in favor of affine.for -> affine.parallel detection followed by an affine.parallel -> scf.parallel -> gpu.launch conversion". | |
mlir/lib/Conversion/SCFToGPU/SCFToGPU.cpp | ||
123 | You don't need to assign to forOp, and also this can just be an isa instead of dyn_cast. if (!isa<AffineForOp>(nested)) ... |
You could perhaps add a TODO here or somewhere else to the effect: "Consider removing this in favor of affine.for -> affine.parallel detection followed by an affine.parallel -> scf.parallel -> gpu.launch conversion".