This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Linalg]: Add memory space to linalg transform::PromoteOp
ClosedPublic

Authored by AviadCo on Aug 29 2023, 2:19 AM.

Details

Summary

This patch allows to supply an optional memory space of the promoted buffer.

Diff Detail

Event Timeline

AviadCo created this revision.Aug 29 2023, 2:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2023, 2:19 AM
AviadCo requested review of this revision.Aug 29 2023, 2:19 AM
AviadCo updated this revision to Diff 555006.Aug 31 2023, 6:00 AM

Updated pass name and used MemRef copy instead of DMA.

hello @AviadCo thanks for your contribution.

I have a question regarding your intended usage of this, do you have a pass pipeline in which this fits (if so it it possible to share it) ?
The reason I am asking is that most passes in Linalg should be considered as "test" passes as we do not have automatic profitability heuristics upstream.

Instead we have been switching to implementing key functionality as transform dialect ops and apply / test them independently of a pass pipeline.
Separately, pass pipelines with proper heuristics can be built from these building blocks.

Here is a recent example of adding a new transform and a new test, in case it is useful: https://reviews.llvm.org/D153420.

You could also directly target a memory space attribute rather than using the hardcoded constants that are backend specific and can be confusing (see also: https://discourse.llvm.org/t/confused-by-inconsistencies-in-gpu-magic-constants/72041).

mlir/include/mlir/Dialect/Linalg/Passes.td
157 ↗(On Diff #555006)

nit: dimensions

mlir/lib/Dialect/Linalg/Transforms/UpdateAddressSpace.cpp
147 ↗(On Diff #555006)

MemRefType::Builder(srcMemRefType).setAddressSpace(...) or something similar plz

159 ↗(On Diff #555006)

you would likely be better off here with a linalg::CopyOp as it can further be mapped to e.g. GPU thread ids and vector load/store using something like: https://reviews.llvm.org/D154836

Actually, here is a contribution that seems related: https://reviews.llvm.org/D144666

Do you see a way to generalize and/or reuse the existing ?

@nicolasvasilache thanks for feedback, I really appreciate it!

I will take a look at Transform Dialect and the references you added here and see how I can generalize a solution.

Thanks,
Aviad Cohen

AviadCo updated this revision to Diff 555975.Sep 6 2023, 12:24 AM

Updated linalg transform::promote operations instead of dedicated pass.

AviadCo retitled this revision from [mlir][linalg]: Add LinalgDMAAddressSpace pass to [mlir][Linalg]: Add memory space to linalg transform::PromoteOp.Sep 6 2023, 12:25 AM
AviadCo edited the summary of this revision. (Show Details)

Actually, here is a contribution that seems related: https://reviews.llvm.org/D144666

Do you see a way to generalize and/or reuse the existing ?

I really appreciate your references. I now see that I may use Linalg transform::PromoteOp with small modification to add memory space support.
Not related to this patch, I am a bit confused why GPU dialect is deeply connected to this transform? IMO we should avoid that coupling. Moreover, I am not using GPU at all but still needs to specify the memory address.

Can you please review this new patch?

nicolasvasilache accepted this revision.Sep 6 2023, 4:52 AM

@AviadCo thanks this looks good to me.

As to the existing state, I suspect the person who added these patterns wanted to implement some simple heuristic to raise the level of control / automation in the transform but ended up coupling the GPU dialect with it in the process.
I'd welcome a refactoring to decouple these concerns indeed.
The operations used for alloc and copy could also be made parametric to avoid hardcoding.

This revision is now accepted and ready to land.Sep 6 2023, 4:52 AM
This revision was landed with ongoing or failed builds.Sep 7 2023, 7:35 AM
This revision was automatically updated to reflect the committed changes.

@AviadCo thanks this looks good to me.

As to the existing state, I suspect the person who added these patterns wanted to implement some simple heuristic to raise the level of control / automation in the transform but ended up coupling the GPU dialect with it in the process.
I'd welcome a refactoring to decouple these concerns indeed.
The operations used for alloc and copy could also be made parametric to avoid hardcoding.

thanks for reviewing so fast!

I acctually have some ideas to improve the promotion a bit (i.e. not to copy outputs before the linalg operation if there are no uses inside the linalg block - meaning output is not an "in/out" operand) so i believe I will make further commits to PromoteOp.