Page MenuHomePhabricator

navdeepkk (Navdeep Kumar)
User

Projects

User does not belong to any projects.

User Details

User Since
Sep 6 2020, 6:45 AM (89 w, 5 d)

Recent Activity

Jun 29 2021

navdeepkk accepted D105170: [MLIR] Fix generateCopyForMemRefRegion.
Jun 29 2021, 10:14 PM · Restricted Project

Jun 10 2021

navdeepkk accepted D103870: [mlir][gpu] Add op to create MMA constant matrix.

This is a great addition. We can bring in a scaling op also which scales mmaMatrix by a certain value. Maybe I can take that up.

It would be nice to be able to handle most of the element-wise ops, ideally we should re-use the std ops but it looks like this would require infrastructure changes to MLIR (https://llvm.discourse.group/t/using-gpu-type-with-standard-ops/3542/2). The best short term solution is probably to add an op taking an attribute like GPU_AllReduceOperationAttr. This is a bit hacky but that would allow us to be able to generate interesting code using the mma ops.

Jun 10 2021, 8:13 AM · Restricted Project

Jun 9 2021

navdeepkk requested changes to D103870: [mlir][gpu] Add op to create MMA constant matrix.

This is a great addition. We can bring in a scaling op also which scales mmaMatrix by a certain value. Maybe I can take that up.

Jun 9 2021, 9:27 AM · Restricted Project

May 27 2021

navdeepkk added a comment to D103099: [mlir] Fix gpu MMA integrations tests.

Thanks! this is something I wasn't aware of. BTW I tested these on a Turing with CUDA10.2, and they passed, but maybe they fail on some other devices.

Yes I ended up doing a more generic fix for the problem as it causing more general problems: https://reviews.llvm.org/D103187

This still feels like a small improvement so I'll move forward with this patch unless you have any concerns.

May 27 2021, 9:21 AM · Restricted Project
navdeepkk accepted D103099: [mlir] Fix gpu MMA integrations tests.

Thanks! this is something I wasn't aware of. BTW I tested these on a Turing with CUDA10.2, and they passed, but maybe they fail on some other devices.

May 27 2021, 9:12 AM · Restricted Project
navdeepkk accepted D103023: [mlir][gpu] Relax restriction on MMA store op to allow chain of mma ops..

This is a great addition.

May 27 2021, 8:53 AM · Restricted Project

May 25 2021

navdeepkk added a comment to D103023: [mlir][gpu] Relax restriction on MMA store op to allow chain of mma ops..

Note that I was also considering just removing the "DOp" altogether and have it always use "COp" but I didn't know if it was consistent with the direction you have in mind. This would be a good step in the direction of potentially removing those operands altogether.
Let me know what you think.

May 25 2021, 12:05 AM · Restricted Project

May 22 2021

navdeepkk added a comment to D102953: [mlir][GPU] Make MMAMatrixStorage hash unique for a given type..

Great to see this, But I am not fully sure why the earlier key did not work? Isn't the underlying data used to generate the hashCode for ArrayRef<> and StirngRef<>. Here are some references from llvm/include/llvm/ADT/ArrayRef.h and llvm/lib/Support/StringRef.cpp.

template <typename T> hash_code hash_value(ArrayRef<T> S) {
  return hash_combine_range(S.begin(), S.end());
}
// Implementation of StringRef hashing.
hash_code llvm::hash_value(StringRef S) {
  return hash_combine_range(S.begin(), S.end());
}
May 22 2021, 11:45 PM · Restricted Project
navdeepkk updated the diff for D95334: [MLIR][GPU] Add CUDA Tensor core WMMA test.

Fix commit summary and title.

May 22 2021, 3:45 AM · Restricted Project
navdeepkk retitled D95334: [MLIR][GPU] Add CUDA Tensor core WMMA test from [MLIR][GPU] Add WMMA Tensor core matmul test to [MLIR][GPU] Add CUDA Tensor core WMMA test.
May 22 2021, 3:43 AM · Restricted Project
navdeepkk retitled D95334: [MLIR][GPU] Add CUDA Tensor core WMMA test from [MLIR][CUDA-RUNNER] Add WMMA Tensor core matmul test to [MLIR][GPU] Add WMMA Tensor core matmul test.
May 22 2021, 3:42 AM · Restricted Project
navdeepkk updated the summary of D95334: [MLIR][GPU] Add CUDA Tensor core WMMA test.
May 22 2021, 3:40 AM · Restricted Project
navdeepkk updated the diff for D95334: [MLIR][GPU] Add CUDA Tensor core WMMA test.

Changes in this diff :-

May 22 2021, 3:22 AM · Restricted Project
navdeepkk closed D95333: [MLIR][NVVM] Add test cases to check translation of matrix-multiply accumulate ops to the corresponding intrinsics in NVPTX backend.

Closing revision as changes here were moved to D95331.

May 22 2021, 3:09 AM · Restricted Project
navdeepkk closed D95332: [MLIR][CUDA-RUNNER] Add CL options to pass SM version and index-bitwidth.

Closing revision, As recent changes(removal of mlir-cuda-runner and others) have baked the functionality of this patch into convert-gpu-to-nvvm and gpu-to-cubin.

May 22 2021, 3:07 AM · Restricted Project
navdeepkk updated the diff for D95334: [MLIR][GPU] Add CUDA Tensor core WMMA test.

Changes in this diff :-

May 22 2021, 2:52 AM · Restricted Project
navdeepkk updated the diff for D95334: [MLIR][GPU] Add CUDA Tensor core WMMA test.

Changes in this diff:-

May 22 2021, 2:22 AM · Restricted Project

May 21 2021

navdeepkk updated the diff for D95334: [MLIR][GPU] Add CUDA Tensor core WMMA test.

Rebase on upstream/main.

May 21 2021, 8:49 AM · Restricted Project
navdeepkk updated the diff for D95331: [MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops.

Rebase on upstream/main.

May 21 2021, 8:45 AM · Restricted Project
navdeepkk added a comment to D95331: [MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops.

Thanks!

May 21 2021, 6:10 AM · Restricted Project
navdeepkk updated the diff for D95331: [MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops.

Changes in this diff :-

May 21 2021, 5:53 AM · Restricted Project
navdeepkk updated the diff for D95331: [MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops.

Changes in this diff :-

May 21 2021, 5:48 AM · Restricted Project
navdeepkk added a comment to D95331: [MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops.

This is okay with me except the splitting between files. I really don't understand the motivation behind adding header files to lib/. It is justified if there are some declarations or template private implementations that must be shared between several .cpp files and not visible to the external users, but here is is not the case.

This can be organized as follows:

  • the file mlir/include/mlir/Conversion/GPUToNVVM/GPUToNVVMPass.h contains a populateGPUWMMAPAtterns(...) next to the populateGPUToNVVM that it already has;
  • there's one file, mlir/lib/Conversion/GPUToNVVM/WmmaOpsToNVVM.cpp (also, drop the "lowering" from the name while we are here), that contains everything from CommonTypes.h, and the current header/source pair, all in an anonymous namespace to avoid exporting the names and slowing down the linker;
  • whoever needs this patterns just includes the header and calls the populate function.

This is simple to navigate, reason about and is the pattern that all other conversions adopt.

May 21 2021, 3:51 AM · Restricted Project

May 14 2021

navdeepkk updated the diff for D95331: [MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops.

Changes in this diff :-

1.) Fix formatting in WmmaOpsToNvvmLowering.cpp.
May 14 2021, 2:51 AM · Restricted Project
navdeepkk updated the diff for D95331: [MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops.

Changes in this diff :-

1.)  Address comments on previous diff(343284).
May 14 2021, 12:13 AM · Restricted Project

May 5 2021

navdeepkk updated the diff for D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

Changes in this diff :-

1.) Clang-format fix.
2.) Added TODO to generate MMAMatrix via ODS.
May 5 2021, 11:23 PM · Restricted Project
navdeepkk updated the diff for D95331: [MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops.

Changes in this diff :-

1.) Address comments on previous diff(342267).
May 5 2021, 10:01 PM · Restricted Project
navdeepkk updated the summary of D95331: [MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops.
May 5 2021, 9:54 PM · Restricted Project
navdeepkk updated the summary of D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.
May 5 2021, 9:53 PM · Restricted Project
navdeepkk updated the diff for D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

Changes in this diff :-

1.) Address comments on previous diff(342265).
May 5 2021, 9:50 PM · Restricted Project

May 2 2021

navdeepkk updated the diff for D95334: [MLIR][GPU] Add CUDA Tensor core WMMA test.

Changes in this diff :-

1.) Make changes to operate with the newly intoduced gpu.mma_matrix type.
May 2 2021, 1:44 PM · Restricted Project
navdeepkk updated the diff for D95331: [MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops.

Changes in this diff :-

1.) Rebase on upstream/main.
3.) Make changes to operate with the newly intoduced gpu.mma_matrix type.
May 2 2021, 1:24 PM · Restricted Project
navdeepkk updated the diff for D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

Changes in this diff :-

1.) Rebase on upstream/main.
2.) Address comments on previous diff(324324).
3.) Remove gpu.mmafragment and introduce gpu.mma_matrix type.
May 2 2021, 1:19 PM · Restricted Project

Apr 27 2021

navdeepkk added a comment to D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

@navdeepkk, @bondhugula, any update on this?

Apr 27 2021, 10:12 PM · Restricted Project

Apr 12 2021

navdeepkk added a comment to D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

Hi @ThomasRaoux,
Sorry for the late reply. Great to hear that these ops can be reused in the IREE pipeline too. I was actually busy in some parallel work using these ops and getting it ready for an upcoming submission. The comments regarding the types are still to be addressed. I will surely be working on this, But I will get started on any major changes only by next week. As you mention, It would be great to know what your plans are and how you wish to proceed.

Apr 12 2021, 8:23 PM · Restricted Project

Mar 7 2021

navdeepkk added a comment to D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

Hi all, Thanks for the valuable comments. @ThomasRaoux Thanks for clarifying things on the SPIR-V side.

Mar 7 2021, 9:22 AM · Restricted Project

Mar 3 2021

navdeepkk added a comment to D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

Hi, Thanks for the comments.

A high-level design question: why does the element type of mmafragment have to be a vector type? I'd just use 2D indexing for the fragment, it's not like we are going to extract vectors from it.

I have tried to keep the types as close to what is expected by the corresponding LLVM intrinsics. As the mma.compute intrinsic expects operands in <2 x half> form and also returns things in a similar form, I have used the vector type.

This is an anti-argument for me, I see very little value in just lifting the low-level LLVM abstractions to higher levels. Hardcoding NVVM-specific modeling in the GPU dialect that is
supposed to abstract that away defies the purpose of the GPU dialect. It sounds like memfragment<AxBxf16> would make all of the code, except for a tiny part of the conversion, simpler.

But this op is a low-level abstraction. It may be in the GPU dialect but I don't see a value in raising the abstraction to only lower it again immediately with only one path out of it. Having some ops that are tied to specialized lower level instructions sounds like a reasonable tradeoff to me and by no means against the purpose of the GPU dialect. The abstraction could be raised if there are other lower level ops of that nature that this could be mapped to. As pointed out downthread, trying to put abstractions in here would mean that the operands would have to be packed into <2xhalf> from scalar values. This would just be additional deabstraction and lowering for yet no good reason: it can always be done if there are other backends served by it in the future but even that looks unlikely given how specialized this operation is.

You are right that this op is a low-level abstraction, and that's why it doesn't feel like it really belongs to the GPU dialect. I understand there is a need for an op that is better integrated with the rest of MLIR, e.g., uses memref, and such an op wouldn't fit into the NVVM dialect either. So I wouldn't press my objection as long as another reviewer (@herhut, @csigg or @ThomasRaoux) agrees to have this in GPU dialect as is.

More generally, I think we will run into this problem again: we need some way of having MLIR-friendlier versions of LLVM IR intrinsics without having to duplicate abstractions. There are similar cases in "target vector" dialects like AVX512. Ideas welcome.

I think it would be good to have it in the GPU dialect to have a common layer for SPIR-V, NVVM and potentially ROCDL if it applies. The challenge is to pick a good abstraction for all of those. SPIR-V dialect already has CooperativeMatrix ops and types which are the exact equivalent to MMA. I think we should try to remove what is an overfit for NVVM like using vector type for mmafragment but otherwise this is the right direction in my opinion.

Mar 3 2021, 9:21 PM · Restricted Project

Feb 17 2021

navdeepkk updated the diff for D95334: [MLIR][GPU] Add CUDA Tensor core WMMA test.

Changes in this diff :-

Feb 17 2021, 8:57 AM · Restricted Project
navdeepkk updated the diff for D95331: [MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops.

Changes in this diff :-

Feb 17 2021, 8:54 AM · Restricted Project
navdeepkk updated the diff for D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

Changes in this diff:-

Feb 17 2021, 8:50 AM · Restricted Project

Feb 9 2021

navdeepkk added a comment to D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

Hi, Thanks for the comments.

A high-level design question: why does the element type of mmafragment have to be a vector type? I'd just use 2D indexing for the fragment, it's not like we are going to extract vectors from it.

I have tried to keep the types as close to what is expected by the corresponding LLVM intrinsics. As the mma.compute intrinsic expects operands in <2 x half> form and also returns things in a similar form, I have used the vector type.

This is an anti-argument for me, I see very little value in just lifting the low-level LLVM abstractions to higher levels. Hardcoding NVVM-specific modeling in the GPU dialect that is supposed to abstract that away defies the purpose of the GPU dialect. It sounds like memfragment<AxBxf16> would make all of the code, except for a tiny part of the conversion, simpler.

Feb 9 2021, 9:57 AM · Restricted Project
navdeepkk added a comment to D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

Hi, Thanks for the comments.

Feb 9 2021, 8:57 AM · Restricted Project

Feb 4 2021

navdeepkk retitled D95332: [MLIR][CUDA-RUNNER] Add CL options to pass SM version and index-bitwidth from [MLIR][CUDA-RUNNER] Add CL option to pass SM version to [MLIR][CUDA-RUNNER] Add CL options to pass SM version and index-bitwidth.
Feb 4 2021, 2:02 AM · Restricted Project
navdeepkk updated the diff for D95334: [MLIR][GPU] Add CUDA Tensor core WMMA test.

Changes in this diff :-

1.) Modify the test case to use the !gpu.mmafragment type introduced in 
  revision D95330.
Feb 4 2021, 12:06 AM · Restricted Project
navdeepkk updated the diff for D95332: [MLIR][CUDA-RUNNER] Add CL options to pass SM version and index-bitwidth.

Changes in this diff:-

1.) Add CL option to pass index-bitwidth for LLVM lowering passes
  on the device side.
Feb 4 2021, 12:04 AM · Restricted Project
navdeepkk updated the diff for D95331: [MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops.

Changes in this diff :-

1.) Rebase on master to drop the use of LLVMType.
2.) Make changes to use the !gpu.mmafragment type introduced in parent
  revision D95330.
Feb 4 2021, 12:00 AM · Restricted Project

Feb 3 2021

navdeepkk updated the diff for D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

Issues in diff 318905 :-

1.) The design used memrefs to model mmafragments and were allocated in `.local`
  space in the PTX generated. This compeletely destroyed the purpose of wmmaOps(to
  use operands in `.reg` space.).
Feb 3 2021, 11:56 PM · Restricted Project

Jan 24 2021

navdeepkk added a reviewer for D95334: [MLIR][GPU] Add CUDA Tensor core WMMA test: herhut.
Jan 24 2021, 11:35 PM · Restricted Project
navdeepkk added a reviewer for D95333: [MLIR][NVVM] Add test cases to check translation of matrix-multiply accumulate ops to the corresponding intrinsics in NVPTX backend: herhut.
Jan 24 2021, 11:33 PM · Restricted Project
navdeepkk requested review of D95334: [MLIR][GPU] Add CUDA Tensor core WMMA test.
Jan 24 2021, 11:27 PM · Restricted Project
navdeepkk requested review of D95333: [MLIR][NVVM] Add test cases to check translation of matrix-multiply accumulate ops to the corresponding intrinsics in NVPTX backend.
Jan 24 2021, 11:24 PM · Restricted Project
navdeepkk requested review of D95332: [MLIR][CUDA-RUNNER] Add CL options to pass SM version and index-bitwidth.
Jan 24 2021, 11:21 PM · Restricted Project
navdeepkk requested review of D95331: [MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops.
Jan 24 2021, 11:17 PM · Restricted Project
navdeepkk requested review of D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.
Jan 24 2021, 11:10 PM · Restricted Project

Jan 8 2021

navdeepkk added a comment to D92233: [MLIR][Affine] Add affine.for normalization support.

Hey guys,
I found a problem with this pass; if you feed it the following loop:

func @main(%arg0: memref<10xf32>, %x: f32) {
  %c0 = constant 0 : index
  %c10 = constant 0 : index
  affine.for %i = affine_map<(d0) -> (d0)>(%c0) to affine_map<(d0) -> (d0)>(%c10) {
    affine.store %x, %arg0[%i] : memref<10xf32>
  }
  return
}

then it will be collapsed to a zero-iteration loop:

#map0 = affine_map<(d0, d1) -> (0)>
#map1 = affine_map<(d0, d1) -> (d1 + d0)>
module  {
  func @main(%arg0: memref<10xf32>, %arg1: f32) {
    %c0 = constant 0 : index
    %c0_0 = constant 0 : index
    affine.for %arg2 = 0 to #map0(%c0_0, %c0) {
      %0 = affine.apply #map1(%c0_0, %arg2)
      affine.store %arg1, %arg0[%0] : memref<10xf32>
    }
    return
  }
}

As a workaround it is possible to run canonicalization before normalization.
Should I report it somewhere like bugzilla (it doesn't seem to be very active though)?

Jan 8 2021, 10:00 PM · Restricted Project

Dec 5 2020

navdeepkk updated the diff for D92233: [MLIR][Affine] Add affine.for normalization support.

Address comments on diff 308060

Dec 5 2020, 10:07 PM · Restricted Project

Nov 27 2020

navdeepkk requested review of D92233: [MLIR][Affine] Add affine.for normalization support.
Nov 27 2020, 7:57 AM · Restricted Project

Sep 17 2020

navdeepkk updated the diff for D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.

Remove include directive "block.h" from "TestAffineLoopParametricTiling.cpp"

Sep 17 2020, 7:50 AM · Restricted Project
navdeepkk updated the diff for D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.

Address comments on diff(291964).

Sep 17 2020, 2:11 AM · Restricted Project

Sep 16 2020

navdeepkk updated the summary of D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.
Sep 16 2020, 8:10 AM · Restricted Project

Sep 15 2020

navdeepkk added a comment to D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.

constructTiledSetHyper... appears to have been moved but not mentioned in the commit summary as an NFC move. Has it been updated in some way?

Sep 15 2020, 10:03 PM · Restricted Project
navdeepkk updated the summary of D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.
Sep 15 2020, 10:02 AM · Restricted Project
navdeepkk updated the diff for D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.

Address comments on diff (291253).

Sep 15 2020, 10:00 AM · Restricted Project

Sep 11 2020

navdeepkk updated the summary of D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.
Sep 11 2020, 8:56 PM · Restricted Project
navdeepkk updated the diff for D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.

Remove auto from appropriate places in TestAffineLoopParametricTiling.cpp

Sep 11 2020, 9:41 AM · Restricted Project
navdeepkk updated the diff for D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.

Address comments on diff (290923).

Sep 11 2020, 6:26 AM · Restricted Project

Sep 10 2020

navdeepkk updated the diff for D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.

Address comments on diff (290734).

Sep 10 2020, 3:21 AM · Restricted Project

Sep 9 2020

navdeepkk updated the summary of D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.
Sep 9 2020, 7:05 AM · Restricted Project
navdeepkk updated the diff for D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.

Address comments on initial diff (290671)

Sep 9 2020, 7:01 AM · Restricted Project
navdeepkk edited reviewers for D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling, added: bondhugula; removed: nicolasvasilache.
Sep 9 2020, 2:01 AM · Restricted Project
navdeepkk requested review of D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.
Sep 9 2020, 1:58 AM · Restricted Project