Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/test/Dialect/GPU/all-reduce.mlir | ||
---|---|---|
7–176 ↗ | (On Diff #236000) | These CHECKs were generated from the output with: sed -r \ -e 's|\t+//.*||' \ -e 's|%([a-z_0-9]+) = |%[[v\1:[a-z_0-9]+]] = |' \ -e 's|\(%([a-z_0-9]+): ([a-z_0-9]+)\):|(%[[v\1:[a-z_0-9]+]]: \2):|' \ -e 's|%([a-z_0-9]+)|%[[v\1]]|g' \ -e 's|bb([0-9]+)|bb[[#b\1]]|g' \ -e 's|^ | // CHECK:|' and manaul edits from there. |
Unit tests: pass. 61127 tests passed, 0 failed and 728 were skipped.
clang-tidy: fail. Please fix clang-tidy findings.
clang-format: pass.
Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
Wrong clang-tidy checks are annoying here, will make another pass later.
mlir/include/mlir/Dialect/GPU/GPUOps.td | ||
---|---|---|
163 ↗ | (On Diff #236000) | Style nit: add whitespace around "+" |
mlir/lib/Dialect/GPU/Transforms/AllReduceLowering.cpp | ||
29 | Nit: explicit is unnecessary for multi-argument constructors. | |
39 | Nit: can we use "neutral" terminology: shared memory -> workgroup memory, block -> workgroup ? | |
47 | Should this be store %warp_reduce, %workgroup_buffer[%warp_id]? | |
52 | Should this be cond_br %is_valid_warp instead? | |
67 | Big and mechanical change: Value is now value-based, please use it without pointers. | |
81 | Style nit: add a comment /*bitwidth=*/ for 32 | |
88 | Not sure I follow here. numThreadsWithSmallerWarpId is the equivalent of warp_id * warp_size, or floor(linear_thread_id, warp_size) * warp_size. Subtracting that from blockSize will give you the number of threads in all warps starting from the current warp. From skimming through the consumer (createWarpReduce), it looks like what it expects is the number of threads in the _current_ warp. | |
104 | This looks like you want EDSC :) | |
138 | Maybe you can store the location and pass it as first argument? It's the same for all operations you create above. | |
150 | Nit: let's have a named constant for the address space | |
201 | Why only F32? Could you have isa<FloatType>() instead? | |
243 | Is this lambda worse it? | |
mlir/test/Dialect/GPU/all-reduce.mlir | ||
7 ↗ | (On Diff #236000) | Is there a reason why you can't use .* as a regexp for names? Seeing the full with ranges makes it hard to read the test. I think we also support capital letters in the names. |
7–176 ↗ | (On Diff #236000) | We have https://github.com/llvm/llvm-project/blob/master/mlir/utils/generate-test-checks.py, have you tried it? |
Thanks a lot for the review, Alex!
mlir/lib/Dialect/GPU/Transforms/AllReduceLowering.cpp | ||
---|---|---|
29 | Removed. Nit the nit: explicit does prevent copy-initialization from initializer list. | |
88 | Correct. The activeWidth does not need to be clamped to subgroup width though. I added two comments. | |
150 | Punted by just adding a local variable. | |
243 | Haha, I like your comment. Very subtle ;-) | |
mlir/test/Dialect/GPU/all-reduce.mlir | ||
7 ↗ | (On Diff #236000) | Replaced with generate-test-checks.py result. |
7–176 ↗ | (On Diff #236000) | Works very well, I wish I knew about this (well, Mehdi mentioned it but I couldn't find it). |
Unit tests: pass. 61127 tests passed, 0 failed and 728 were skipped.
clang-tidy: fail. Please fix clang-tidy findings.
clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.
Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
mlir/lib/Dialect/GPU/Transforms/AllReduceLowering.cpp | ||
---|---|---|
352 | See two lines above: every time the callback is invoked. This makes sure that all occurrences of gpu.all_reduce in the same gpu.function are replaced. |
Unit tests: pass. 61127 tests passed, 0 failed and 728 were skipped.
clang-tidy: fail. Please fix clang-tidy findings.
clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.
Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
Almost there, only minor things necessary to improve understanding.
The approach you took for building code is very similar to our motivation for declarative builders (aka EDSC), maybe it's time to reconsider those again.
mlir/lib/Dialect/GPU/Transforms/AllReduceLowering.cpp | ||
---|---|---|
150 | I'll add that later anyway :) | |
268 | I'd add an assertion that predicatedOpsFactory() does not return any values that it might expect to be passed to the continuation. | |
354 | Could you please add a comment on why this approach is necessary? I understand that you need to operate on the GPUFuncOp level because you modify the GPUFuncOp itself, which would be incorrect from a nested operation (AllReduceOp). I don't understand why do you need to interrupt the walk after each rewrite. Is it because of some state invalidation? |
The approach you took for building code is very similar to our motivation for declarative builders (aka EDSC), maybe it's time to reconsider those again.
Yes, I'm happy to change this to EDSC as a follow-up. For now I think it is easier to keep it similar to the existing lowering to NVVM.
mlir/lib/Dialect/GPU/Transforms/AllReduceLowering.cpp | ||
---|---|---|
150 | Thanks! | |
354 | Correct, the walk iterators get invalidated from the replace. Comment added. |
Unit tests: unknown.
clang-tidy: fail. Please fix clang-tidy findings.
clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.
Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt
What is the status re EDSC discussion?
Were the pointers I sent offline enough to give a good picture / do you see how to followup on this to use (and maybe extend) EDSCs?
Yes, from scanning the doc and code it looks like this this shouldn't be too difficult. But I haven't actually tried.
Unit tests: unknown.
clang-tidy: unknown.
clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.
Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt
Unit tests: fail. 61801 tests passed, 1 failed and 781 were skipped.
failed: MLIR.Dialect/GPU/all-reduce.mlir
clang-tidy: unknown.
clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.
Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
Unit tests: fail. 61801 tests passed, 1 failed and 781 were skipped.
failed: MLIR.Dialect/GPU/all-reduce.mlir
clang-tidy: unknown.
clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.
Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
What is the status re EDSC discussion?
Were the pointers I sent offline enough to give a good picture / do you see how to followup on this to use (and maybe extend) EDSCs?
Could you keep me in the loop on that? I have an even simpler proposal in mind that could reconcile EDSC with imperative builders.
I pointed Christian to the EDSC doc (https://mlir.llvm.org/docs/EDSC/) and the builder-api-test but otherwise nothing else was discussed.
I didn't look in the details of what would be involved in making the transition.
Can you share what the simpler proposal, than just using the declarative builders, is?
I'm interested to see some movement on this, at this point EDSC isn't totally endorsed by the whole team, and the uncontrolled use in the codebase isn't a great situation.
I have reviewed this before (the basic approach has not changed). Thanks for adding more comments and tests.
LGTM.
Regarding the EDSC vs. use of locally grown alternative in this patch: I try to avoid doing things like the template for adding location. While it saves some characters, it makes code look unfamiliar, which is also a cost. Adding helpers to emit conditionals etc. on the other hand reduces repetition, so I see that as a benefit. Still, I am fine with landing this as it seems to be the plan to evolve it to some EDSC like thing once there is an endorsed one.
@mehdi_amini @nicolasvasilache @herhut Let's take some time and discuss builder APIs outside this diff (also involving @rriddle). My basic observations are that (1) writing structured IR, as in "with nested regions", looks unnecessarily complicated with builders, arguments are the same as those against goto-style programming; (2) a lot of IR construction internally happens in rewrite patterns, where location almost always remains the same, that of the matched operation root; (3) current EDSC APIs are contentious partly because it is unclear when reading the code when the function call creates the IR vs. when it's just a function call.
@csigg I'm fine accepting this, but the test is currently broken. Please fix and we'll be ready to land.
@herhut @ftynse @mehdi_amini sounds good, it's high time to rediscuss this and to make MLIR play nicely with metaprogramming in a way that feels comfortable to everyone.
Unit tests: unknown.
clang-tidy: unknown.
clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.
Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt
Unit tests: pass. 62015 tests passed, 0 failed and 783 were skipped.
clang-tidy: unknown.
clang-format: pass.
Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
Nit: explicit is unnecessary for multi-argument constructors.