This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops
ClosedPublic

Authored by navdeepkk on Jan 24 2021, 11:17 PM.

Details

Summary

[MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply
accumulate GPU ops
Add conversion of warp synchronous matrix-multiply accumulate GPU ops to
NVVM ops. The following conversions are added :-

1.) subgroup_mma_load_matrix -> wmma.m16n16k16.load.[a,b,c]..row.stride
2.) subgroup_mma_store_matrix -> wmma.m16n16k16.store.d.[f16,f32].row.stride
3.) subgroup_mma_compute -> wmma.m16n16k16.mma.row.row.[f16,f32].[f16,f32]

Diff Detail

Event Timeline

navdeepkk created this revision.Jan 24 2021, 11:17 PM
navdeepkk requested review of this revision.Jan 24 2021, 11:17 PM

This needs to be rebased on master, I have removed LLVMType several weeks ago.

ftynse requested changes to this revision.Jan 25 2021, 5:28 AM
This revision now requires changes to proceed.Jan 25 2021, 5:28 AM
navdeepkk updated this revision to Diff 321330.Feb 4 2021, 12:00 AM

Changes in this diff :-

1.) Rebase on master to drop the use of LLVMType.
2.) Make changes to use the !gpu.mmafragment type introduced in parent
  revision D95330.
ftynse added a comment.Feb 9 2021, 6:22 AM

I haven't done a detailed review since this commit may change if its parent changes.

mlir/lib/Conversion/GPUToNVVM/CommonTypes.h
1 ↗(On Diff #321330)

Headers need to have -*- C++ -*- at the end of the first line

mlir/lib/Conversion/GPUToNVVM/WmmaLoadOptoNvvmLowering.h
13 ↗(On Diff #321330)

I don't see the point of putting _implementations_ of conversion patterns into separate _header_ files.

mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir
16–37

I don't think we care if these operations are on the next line or not. Furthermore, they seem to be testing the conversion of allocations, which isn't anyhow relevant to what this test is about. Such tests lead to excessive churn and spurious breakages. Please only test what the new code does.

ftynse requested changes to this revision.Feb 9 2021, 6:22 AM
This revision now requires changes to proceed.Feb 9 2021, 6:22 AM
navdeepkk updated this revision to Diff 324328.Feb 17 2021, 8:54 AM
navdeepkk marked 3 inline comments as done.

Changes in this diff :-

1.) Address comments on diff 321330.
navdeepkk updated this revision to Diff 342267.May 2 2021, 1:24 PM

Changes in this diff :-

1.) Rebase on upstream/main.
3.) Make changes to operate with the newly intoduced gpu.mma_matrix type.
bondhugula added inline comments.May 2 2021, 5:54 PM
mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
146

C++ style comment here. Terminate with full stop.

mlir/lib/Conversion/GPUToNVVM/WmmaLoadStoreToNvvmLowering.h
31 ↗(On Diff #342267)

Cannot -> cannot

36 ↗(On Diff #342267)

Typo: implements

64 ↗(On Diff #342267)

Expected -> expected

73 ↗(On Diff #342267)

Remove commented out code please.

302–307 ↗(On Diff #342267)

Nit: In such cases, consider using braces for the then and else blocks.

mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir
42–57

Nit: Leave a space after //, i.e., CHECK -> CHECK

bondhugula added inline comments.May 2 2021, 5:59 PM
mlir/lib/Conversion/GPUToNVVM/CommonTypes.h
25 ↗(On Diff #342267)

Many of these are now built-in MLIR types. common LLVM -> 'common LLVM and builtin MLIR`

62–63 ↗(On Diff #342267)

The context here isn't clear as to operands to which op. Mention about mma/wmma when referring to fragments?

74–75 ↗(On Diff #342267)

Doc comments for these two.

mlir/lib/Conversion/GPUToNVVM/WmmaLoadStoreToNvvmLowering.h
119 ↗(On Diff #342267)

Drop commented out code.

navdeepkk edited the summary of this revision. (Show Details)May 5 2021, 9:54 PM
navdeepkk updated this revision to Diff 343284.May 5 2021, 10:01 PM
navdeepkk marked 11 inline comments as done.

Changes in this diff :-

1.) Address comments on previous diff(342267).
bondhugula added inline comments.May 6 2021, 7:48 AM
mlir/lib/Conversion/GPUToNVVM/CommonTypes.h
15 ↗(On Diff #343284)

clang-tidy warning if this is a meaningful one.

mlir/lib/Conversion/GPUToNVVM/WmmaLoadStoreToNvvmLowering.h
82–90 ↗(On Diff #343284)

All of this code in header files instead of a source file? Is this an exception for some reason?

ftynse requested changes to this revision.May 6 2021, 8:15 AM
ftynse added inline comments.
mlir/lib/Conversion/GPUToNVVM/WmmaLoadStoreToNvvmLowering.h
25 ↗(On Diff #343284)

Add a documentation comment please.

47 ↗(On Diff #343284)

How about having this class (privately) derive CommonLLVMAndBuiltInMLIRTypes instead of containing it? This will let it use the fields directly without the annoying prefix.

Also, I'm not entirely certain why this is even needed. Most patterns don't need all of the types, yet this will create the types inside each pattern, taking locks in the context. Yes, we won't need to do it every time the pattern is applied, but it feels like the number of spuriously created types compensates this. Another alternative would be to have a single instance of CommonLLVMAndBuiltInMLIRTypes and pass a reference to this instance to each pattern.

64 ↗(On Diff #343284)

Typo: meref -> memref ?

75 ↗(On Diff #343284)

Please expand auto here.

75 ↗(On Diff #343284)

Also, I suppose you need promoteOneMemRefDescriptor instead. Then you can wrap the result into a MemRefDescriptor class and get much more readable code below.

76 ↗(On Diff #343284)

Could this use OpAdaptor to get named operand accessors instead of magic constants?

82–90 ↗(On Diff #343284)

I made the same comment on the first version of this patch, there is no point for this code to be in a header, even if the header lives under lib/.

93 ↗(On Diff #343284)

Please avoid magic constants.

186 ↗(On Diff #343284)

Same comments as above in this class.

302–307 ↗(On Diff #342267)

This doesn't look done, did you forget to upload?

mlir/lib/Conversion/GPUToNVVM/WmmaMmaOptoNvvmLowering.h
35–44 ↗(On Diff #343284)

A similar function is present in another file this patch is adding. Consider refactoring to only have one definition.

60 ↗(On Diff #343284)

SmallVector now has a default number of stack elements. Drop the manually specified number unless there is a specific reason to choose the value. Here, I don't see why 24 is special.

87 ↗(On Diff #343284)

Please use OpAdaptors here and below to avoid operands[magic-constant]

mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir
16–37

I still see a lot of CHECK-NEXT, and I am not convinced that we should care about these operations being on subsequent strings. If tomorrow we decide to change the syntax of llvm.extractvalue to print the type a separate line, these tests would break for no good reason.

This revision now requires changes to proceed.May 6 2021, 8:15 AM
navdeepkk marked 16 inline comments as done.

Changes in this diff :-

1.)  Address comments on previous diff(343284).
navdeepkk updated this revision to Diff 345382.May 14 2021, 2:51 AM

Changes in this diff :-

1.) Fix formatting in WmmaOpsToNvvmLowering.cpp.
bondhugula accepted this revision.May 16 2021, 12:00 AM
ftynse requested changes to this revision.May 20 2021, 1:53 AM

This is okay with me except the splitting between files. I really don't understand the motivation behind adding header files to lib/. It is justified if there are some declarations or template private implementations that must be shared between several .cpp files and not visible to the external users, but here is is not the case.

This can be organized as follows:

  • the file mlir/include/mlir/Conversion/GPUToNVVM/GPUToNVVMPass.h contains a populateGPUWMMAPAtterns(...) next to the populateGPUToNVVM that it already has;
  • there's one file, mlir/lib/Conversion/GPUToNVVM/WmmaOpsToNVVM.cpp (also, drop the "lowering" from the name while we are here), that contains everything from CommonTypes.h, and the current header/source pair, all in an anonymous namespace to avoid exporting the names and slowing down the linker;
  • whoever needs this patterns just includes the header and calls the populate function.

This is simple to navigate, reason about and is the pattern that all other conversions adopt.

mlir/lib/Conversion/GPUToNVVM/CommonTypes.h
1 ↗(On Diff #321330)

Looks like this comment wasn't addressed..

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
146

Doesn't look addressed. C++-style comments are line comments starting with //.

mlir/lib/Conversion/GPUToNVVM/WmmaOpsToNvvmLowering.cpp
31 ↗(On Diff #345382)

Nit: add braces here, this isn't a trivial condition anymore.

50 ↗(On Diff #345382)

I would have just used unsigned here.

140–142 ↗(On Diff #345382)

Nit: can you put this comment inside return rewriter.notifyMatchFailure("") instead? Here and below.

This revision now requires changes to proceed.May 20 2021, 1:53 AM
navdeepkk marked 5 inline comments as done.May 21 2021, 3:51 AM

This is okay with me except the splitting between files. I really don't understand the motivation behind adding header files to lib/. It is justified if there are some declarations or template private implementations that must be shared between several .cpp files and not visible to the external users, but here is is not the case.

This can be organized as follows:

  • the file mlir/include/mlir/Conversion/GPUToNVVM/GPUToNVVMPass.h contains a populateGPUWMMAPAtterns(...) next to the populateGPUToNVVM that it already has;
  • there's one file, mlir/lib/Conversion/GPUToNVVM/WmmaOpsToNVVM.cpp (also, drop the "lowering" from the name while we are here), that contains everything from CommonTypes.h, and the current header/source pair, all in an anonymous namespace to avoid exporting the names and slowing down the linker;
  • whoever needs this patterns just includes the header and calls the populate function.

This is simple to navigate, reason about and is the pattern that all other conversions adopt.

Hi @ftynse,
Thanks for the insightful comments. I have a question regarding this new structure. Once I put all the code in WmmaOpsToNVVM.cpp the lowering patterns are no longer exposed. So I cannot directly use them in LowerGpuOpsToNVVMOps.cpp where they are actually added into RewritePatternSet. The way I could find is to include WmmaOpsToNVVm.cpp in LowerGpuOpsToNVVMOps.cpp. Is that okay to do in this context? Or can we just expose the lowerings in a header file?

mlir/lib/Conversion/GPUToNVVM/CommonTypes.h
15 ↗(On Diff #343284)

The style given here already followed. https://llvm.org/docs/CodingStandards.html#header-guard

mlir/lib/Conversion/GPUToNVVM/WmmaLoadOptoNvvmLowering.h
13 ↗(On Diff #321330)

We may add more versions of NVVM ops in the future, so each lowering file may grow large. To save hassle later, I kept them in separate files. Let me know if you want me to merge them.

mlir/lib/Conversion/GPUToNVVM/WmmaLoadStoreToNvvmLowering.h
47 ↗(On Diff #343284)

Took the first approach and removed all the spurious types present. Now the types are minimal and the overheads would be less. Let me know what you think.

Hi @ftynse,
Thanks for the insightful comments. I have a question regarding this new structure. Once I put all the code in WmmaOpsToNVVM.cpp the lowering patterns are no longer exposed. So I cannot directly use them in LowerGpuOpsToNVVMOps.cpp where they are actually added into RewritePatternSet.

You shouldn't even need to use them directly. Just declare a function populateWmmaToNVVMPatterns(RewritePatternSet &set) in GPUToNVVMPass.h, put its declaration in the header and implementation in WmmaOpsToNVVM.cpp. Then call this function from LowerGpuOpsToNVVMOps.cpp. You understand how function declaration works in C++, right? It's enough for the caller to see the declaration (not definition) to be able to call the function.

The way I could find is to include WmmaOpsToNVVm.cpp in LowerGpuOpsToNVVMOps.cpp. Is that okay to do in this context?

Absolutely not! It's almost never okay to include a .cpp into another .cpp.

Or can we just expose the lowerings in a header file?

You don't need to expose the classes, the function that adds instances of these classes into the set will amply suffice, as I mentioned repeatedly. If you look around a bit in the code base, LowerGpuOpsToNVVMOps.cpp defines the pattern classes in an anonymous namespace and they are not at all visible in the headers. The users of this lowering just call populateGpuToNVVMConversionPatterns and never ever need to see the classes. I see no reason why you can't just do the same.

navdeepkk updated this revision to Diff 347000.May 21 2021, 5:48 AM

Changes in this diff :-

1.) Address comments on diff 345382.

navdeepkk updated this revision to Diff 347002.May 21 2021, 5:53 AM

Changes in this diff :-

1.) Fix spelling in file title.

ftynse accepted this revision.May 21 2021, 6:02 AM

Thanks!

This revision is now accepted and ready to land.May 21 2021, 6:02 AM

Thanks!

Thanks for the long series of great insightful comments. And apologies for taking too long in the re-structuring part.

bondhugula accepted this revision.May 21 2021, 6:25 AM

Thanks!

Thanks for the long series of great insightful comments. And apologies for taking too long in the re-structuring part.

There appears to be something weird with this patch - doing an arc patch D95331 yields an exception and then shows zero changes. Looks like an arc bug:

...
Applied patch mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h cleanly.
Applied patch mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td cleanly.
Applied patch mlir/include/mlir/Dialect/GPU/GPUOps.td cleanly.
Applied patch mlir/include/mlir/Dialect/GPU/GPUDialect.h cleanly.
Applied patch mlir/include/mlir/Dialect/GPU/GPUBase.td cleanly.
 COMMITTED  Successfully committed patch.

 Cherry Pick Failed!
 Exception 
Command failed with error #1!
COMMAND
git cherry-pick -- arcpatch-D95330

STDOUT
Auto-merging mlir/test/Dialect/LLVMIR/invalid.mlir
Auto-merging mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
Auto-merging mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
Auto-merging mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h
Auto-merging mlir/include/mlir/Dialect/GPU/GPUOps.td
On branch arcpatch-D95331_1
You are currently cherry-picking commit 372dcf47bd93.
  (all conflicts fixed: run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	mlir/0001-MLIR-Update-affine.for-unroll-for-iter_args-support.patch
	mlir/compile_commands.json
	mlir/tags
	mlir/tf.mlir

nothing added to commit but untracked files present (use "git add" to track)


STDERR
The previous cherry-pick is now empty, possibly due to conflict resolution.
If you wish to commit it anyway, use:

    git commit --allow-empty

Otherwise, please use 'git cherry-pick --skip'

(Run with `--trace` for a full exception trace.)

Using --trace shows:

Otherwise, please use 'git cherry-pick --skip'
 at [<arcanist>/src/future/exec/ExecFuture.php:421]
arcanist(head=master, ref.master=239ad5c55d8d)
  #0 ExecFuture::raiseResultError(array) called at [<arcanist>/src/future/exec/ExecFuture.php:325]
  #1 ExecFuture::resolvex() called at [<arcanist>/src/repository/api/ArcanistRepositoryAPI.php:399]
  #2 ArcanistRepositoryAPI::execxLocal(string, string) called at [<arcanist>/src/workflow/ArcanistPatchWorkflow.php:778]
  #3 ArcanistPatchWorkflow::run() called at [<arcanist>/src/workflow/ArcanistPatchWorkflow.php:983]
  #4 ArcanistPatchWorkflow::applyDependencies(ArcanistBundle) called at [<arcanist>/src/workflow/ArcanistPatchWorkflow.php:469]
  #5 ArcanistPatchWorkflow::run() called at [<arcanist>/src/workflow/ArcanistPatchWorkflow.php:398]
  #6 ArcanistPatchWorkflow::run() called at [<arcanist>/scripts/arcanist.php:419]
<<< [1] (+15,126) <exec> 15,126,163 us
navdeepkk updated this revision to Diff 347046.May 21 2021, 8:45 AM

Rebase on upstream/main.

This revision was landed with ongoing or failed builds.May 21 2021, 8:51 AM
This revision was automatically updated to reflect the committed changes.