Page MenuHomePhabricator

[MLIR] Add std.atomic_rmw op
Needs ReviewPublic

Authored by flaub on Tue, Feb 11, 6:05 AM.

Details

Summary

The RFC for this op is here: https://llvm.discourse.group/t/rfc-add-std-atomic-rmw-op/489

The std.atmomic_rmw op provides a way to support read-modify-write
sequences with data race freedom. It is intended to be used in the lowering
of an upcoming affine.atomic_rmw op which can be used for reductions.

A lowering to LLVM is provided with 2 paths:

  • Simple patterns: llvm.atomicrmw
  • Everything else: llvm.cmpxchg

Diff Detail

Unit TestsFailed

TimeTest
1,390 msMLIR.mlir-cpu-runner::bare_ptr_call_conv.mlir
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/mlir-opt /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/mlir/test/mlir-cpu-runner/bare_ptr_call_conv.mlir -convert-loop-to-std -convert-std-to-llvm='use-bare-ptr-memref-call-conv=1' | mlir-cpu-runner -shared-libs=/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/lib/libmlir_runner_utils.so -entry-point-result=void | /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/FileCheck /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/mlir/test/mlir-cpu-runner/bare_ptr_call_conv.mlir
1,330 msMLIR.mlir-cpu-runner::linalg_integration_test.mlir
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/mlir-opt /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/mlir/test/mlir-cpu-runner/linalg_integration_test.mlir -convert-linalg-to-llvm | mlir-cpu-runner -e dot -entry-point-result=f32 -shared-libs=/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/lib/libcblas.so,/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/lib/libcblas_interface.so | /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/FileCheck /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/mlir/test/mlir-cpu-runner/linalg_integration_test.mlir
1,390 msMLIR.mlir-cpu-runner::simple.mlir
Script: -- : 'RUN: at line 1'; mlir-cpu-runner /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/mlir/test/mlir-cpu-runner/simple.mlir | /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/FileCheck /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/mlir/test/mlir-cpu-runner/simple.mlir
1,310 msMLIR.mlir-cpu-runner::unranked_memref.mlir
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/mlir-opt /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/mlir/test/mlir-cpu-runner/unranked_memref.mlir -convert-linalg-to-loops -convert-linalg-to-llvm -convert-std-to-llvm | mlir-cpu-runner -e main -entry-point-result=void -shared-libs=/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/lib/libmlir_runner_utils.so,/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/lib/libcblas.so,/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/lib/libcblas_interface.so | /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/FileCheck /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/mlir/test/mlir-cpu-runner/unranked_memref.mlir
1,320 msMLIR.mlir-cpu-runner::utils.mlir
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/mlir-opt /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/mlir/test/mlir-cpu-runner/utils.mlir -convert-linalg-to-loops -convert-linalg-to-llvm -convert-std-to-llvm | mlir-cpu-runner -e print_0d -entry-point-result=void -shared-libs=/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/lib/libmlir_runner_utils.so | /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/FileCheck /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/mlir/test/mlir-cpu-runner/utils.mlir --check-prefix=PRINT-0D

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jbruestle added inline comments.Tue, Feb 11, 10:48 AM
mlir/include/mlir/Dialect/StandardOps/Ops.td
230

I think you just mean 'block argument' not induction variable here (and below in a few places), since it's not iterating over anything.

252

Maybe rename getInitialValue(), or getLoadedValue() or something similar?

This revision now requires changes to proceed.Tue, Feb 11, 10:48 AM
flaub updated this revision to Diff 243960.Tue, Feb 11, 12:43 PM
  • Review updates
flaub marked 2 inline comments as done.Tue, Feb 11, 12:43 PM
flaub updated this revision to Diff 243962.Tue, Feb 11, 12:45 PM
  • Remove iv
flaub updated this revision to Diff 243963.Tue, Feb 11, 12:48 PM
  • Remove iv
flaub added a comment.Tue, Feb 11, 1:01 PM

Review addressed in latest push.

jbruestle accepted this revision.Tue, Feb 11, 1:16 PM
This revision is now accepted and ready to land.Tue, Feb 11, 1:16 PM
rriddle requested changes to this revision.Tue, Feb 11, 1:25 PM
rriddle added inline comments.
mlir/include/mlir/Dialect/StandardOps/Ops.td
227

typo: indicies -> indices

239

Wrap these in a mlir code block

mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp
2507

static functions should be in the global namespace and marked as static. Only classes should be placed within anonymous namespaces.

2540

Drop trivial braces.

2563

Same here.

2619

Top-level comments should be ///

2677

Don't create SmallVectors for things like this. Use ArrayRef or arrays.

2705

Same here.

mlir/lib/Dialect/StandardOps/Ops.cpp
2979

Remove trivial braces.

I would have expected that this could be covered by ODS constraints.

This revision now requires changes to proceed.Tue, Feb 11, 1:25 PM
flaub updated this revision to Diff 244020.Tue, Feb 11, 3:40 PM
flaub marked 10 inline comments as done.
  • Review updates
flaub marked 3 inline comments as not done.Tue, Feb 11, 3:40 PM
flaub added inline comments.
mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp
2507

I'm OK with this, but I'm unfamiliar with this convention. What's the purpose behind having functions being static instead of being in the anonymous namespace? I was always under the impression that the two were functionally equivalent and that the more 'C++' way was to use anonymous namespaces.

Also, should I close out the namespace here and then add the static functions and then re-open the anonymous namespace? Or would it make sense to move these up above?

mlir/lib/Dialect/StandardOps/Ops.cpp
2979

Did you have a specific trait in mind? I'm comparing the parent's element type to the op's operand type. I didn't see anything in OpBase.td at first glance.

rriddle added inline comments.Tue, Feb 11, 3:51 PM
mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp
2507
mlir/lib/Dialect/StandardOps/Ops.cpp
2979

Hmmm, I thought there was one for this already. I think a lot of the current usages are abusing mlir::getElementTypeOrSelf.

rriddle added inline comments.Tue, Feb 11, 3:52 PM
mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp
2507

Also, should I close out the namespace here and then add the static functions and then re-open the anonymous namespace? Or would it make sense to move these up above?

Whichever one makes sense. You can close the namespace or hoist the functions.

flaub updated this revision to Diff 244026.Tue, Feb 11, 4:00 PM
flaub marked an inline comment as not done.
  • Drop trivial braces
  • Trivial braces
flaub marked an inline comment as done.Tue, Feb 11, 4:06 PM
flaub added inline comments.
mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp
2507

Thanks for the link!

mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp
2645

This seems like it'd be better a loop.while + progressive lowering?

mlir/test/Conversion/StandardToLLVM/convert-to-llvmir.mlir
869

So this is interesting to me.
Weren't you and/or @jbruestle advocating that we should have the reduction op encoded as an attribute in the case or affine.parallel_for with reduction semantics?
It seems a very similar scenario to me, I'd be interested of where you draw the distinction between "encoded as an attribute" and just use a region?

flaub marked 2 inline comments as done.Tue, Feb 11, 5:12 PM
flaub added inline comments.
mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp
2645

I agree that it'd be a lot nicer to write the loop at a higher level, but it wasn't clear to me where this would go. Also, the determination of whether to use a cmpxchg or atomic_rmw is really specific to LLVM lowering. So we wouldn't want to use a loop in the case of simple bodies that can map to a single intrinsic/op at the lower level. I'm also thinking about how we will want to have a lowering from std to SPIR-V or OpenMP, in which case a loop may or may not make sense for those lowerings.

mlir/test/Conversion/StandardToLLVM/convert-to-llvmir.mlir
869

I think we initially thought having a closed attribute would be good, but it seemed that providing the ability to lower arbitrary reductions into cmpxchg wasn't too hard to do and it was easy enough to identify these simple cases that do lower to a single intrinsic. Our current plan is to still use an enum at the top level (the tile dialect), but then use a region for affine and below. The upcoming affine.atomic_rmw should basically mirror the standard one in regards to region vs attribute.

mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp
2645

Not for this revision but in the future there are some nice tricks we could play here.
Your Conversion could very well decide to introduce a higher level loop construct, rewrite the region into that and let the rewrite infrastructure stitch the pieces together.
In other words, I wasn't advocating that the atomic behavior should leak into the other targets, but that you could decide during lowering to introduce a higher level construct that is implemented and tested independently.

Great work! I only have some nits.

mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp
2510

I'd appreciate some documentation on this function and below.

2607

Type conversion may fail and return null, please check the result.

2610

Nit: rewriter.replaceOpWithNewOp<LLVM::AtomicRMWOp>(op, resultType, ...);

2645

This is similar to the discussion @nicolasvasilache and I had about memory copies, and is worse discussing in general. One practical thing I'd like to point out: we need to make sure we don't introduce cyclic library dependencies, and it may be tricky.

2702

Nit: normally, single-result ops should be convertible to Value so you shouldn't be needing the .res() part

flaub updated this revision to Diff 244317.Wed, Feb 12, 6:32 PM
flaub marked 4 inline comments as done.
  • Remove stdx.
  • Review updates
  • Remove iv
  • Review updates
  • Drop trivial braces
  • Trivial braces
  • Address feedback
flaub added a comment.Wed, Feb 12, 6:33 PM

@rriddle Are there anymore blockers that need to be addressed?

ftynse accepted this revision.Fri, Feb 14, 1:55 AM

LGTM when extra documentation is added as discussed on Discourse.

mlir/include/mlir/Dialect/StandardOps/Ops.td
235

Could you please describe the restrictions on the body contents of the atomic region as discussed in the RFC?

rriddle accepted this revision.Fri, Feb 14, 9:28 PM

LGTM after comments are resolved.

mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp
2501

Can you document this struct please.

2508

Use /// for all top-level comments. Here and below.

2727

For single element things you should be able to pass the values directly. Is there a problem you are running into doing that?

This revision is now accepted and ready to land.Fri, Feb 14, 9:28 PM
flaub updated this revision to Diff 245584.EditedWed, Feb 19, 11:08 PM

Simplified design based on RFC feedback

flaub updated this revision to Diff 245585.Wed, Feb 19, 11:12 PM
  • Comments
flaub updated this revision to Diff 245586.Wed, Feb 19, 11:13 PM
  • Fix example
rriddle requested changes to this revision.Wed, Feb 19, 11:51 PM

Thanks for the update Frank! Added a few comments.

mlir/include/mlir/Dialect/StandardOps/Ops.td
262

Should this be a more constrained type, like 'IntegerOrFloatLike'?

mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp
2698

This is invalid, *all* IR mutations need to go through the rewriter. Some pattern drivers, like DialectConversion which is being used here, will undo transformations if something goes wrong. If something is done outside of the rewriter, this leads to invalid code/crashes. Seems like you want to do rewriter.replaceOp here.

2764

Can we keep this ordered?

mlir/lib/Dialect/StandardOps/Ops.cpp
2919

Could you switch to the declarative format? It will format the enum as a string instead of a keyword for now, but that is worth remove all of this parsing code.

This revision now requires changes to proceed.Wed, Feb 19, 11:51 PM
flaub marked 4 inline comments as done.Thu, Feb 20, 12:13 AM
flaub added inline comments.
mlir/include/mlir/Dialect/StandardOps/Ops.td
262

OK, I suppose that will work here for now. I could see this being expanded later if the lowering supported more types (which in theory it could by lowering to cmpxchg or others).

mlir/lib/Conversion/StandardToLLVM/ConvertStandardToLLVM.cpp
2698

OK, thanks for clarifying. I'll try to use rewriter.replaceOp instead.

2764

Will do.

mlir/lib/Dialect/StandardOps/Ops.cpp
2919

Cool, I will try that, thanks (sounds like a worthwhile tradeoff).

flaub updated this revision to Diff 245598.Thu, Feb 20, 12:57 AM
  • Review feedback
flaub updated this revision to Diff 245599.Thu, Feb 20, 1:03 AM
  • Fix example