Page MenuHomePhabricator

navdeepkk (Navdeep Kumar)
User

Projects

User does not belong to any projects.

User Details

User Since
Sep 6 2020, 6:45 AM (31 w, 3 d)

Recent Activity

Mon, Apr 12

navdeepkk added a comment to D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

Hi @ThomasRaoux,
Sorry for the late reply. Great to hear that these ops can be reused in the IREE pipeline too. I was actually busy in some parallel work using these ops and getting it ready for an upcoming submission. The comments regarding the types are still to be addressed. I will surely be working on this, But I will get started on any major changes only by next week. As you mention, It would be great to know what your plans are and how you wish to proceed.

Mon, Apr 12, 8:23 PM · Restricted Project

Mar 7 2021

navdeepkk added a comment to D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

Hi all, Thanks for the valuable comments. @ThomasRaoux Thanks for clarifying things on the SPIR-V side.

Mar 7 2021, 9:22 AM · Restricted Project

Mar 3 2021

navdeepkk added a comment to D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

Hi, Thanks for the comments.

A high-level design question: why does the element type of mmafragment have to be a vector type? I'd just use 2D indexing for the fragment, it's not like we are going to extract vectors from it.

I have tried to keep the types as close to what is expected by the corresponding LLVM intrinsics. As the mma.compute intrinsic expects operands in <2 x half> form and also returns things in a similar form, I have used the vector type.

This is an anti-argument for me, I see very little value in just lifting the low-level LLVM abstractions to higher levels. Hardcoding NVVM-specific modeling in the GPU dialect that is
supposed to abstract that away defies the purpose of the GPU dialect. It sounds like memfragment<AxBxf16> would make all of the code, except for a tiny part of the conversion, simpler.

But this op is a low-level abstraction. It may be in the GPU dialect but I don't see a value in raising the abstraction to only lower it again immediately with only one path out of it. Having some ops that are tied to specialized lower level instructions sounds like a reasonable tradeoff to me and by no means against the purpose of the GPU dialect. The abstraction could be raised if there are other lower level ops of that nature that this could be mapped to. As pointed out downthread, trying to put abstractions in here would mean that the operands would have to be packed into <2xhalf> from scalar values. This would just be additional deabstraction and lowering for yet no good reason: it can always be done if there are other backends served by it in the future but even that looks unlikely given how specialized this operation is.

You are right that this op is a low-level abstraction, and that's why it doesn't feel like it really belongs to the GPU dialect. I understand there is a need for an op that is better integrated with the rest of MLIR, e.g., uses memref, and such an op wouldn't fit into the NVVM dialect either. So I wouldn't press my objection as long as another reviewer (@herhut, @csigg or @ThomasRaoux) agrees to have this in GPU dialect as is.

More generally, I think we will run into this problem again: we need some way of having MLIR-friendlier versions of LLVM IR intrinsics without having to duplicate abstractions. There are similar cases in "target vector" dialects like AVX512. Ideas welcome.

I think it would be good to have it in the GPU dialect to have a common layer for SPIR-V, NVVM and potentially ROCDL if it applies. The challenge is to pick a good abstraction for all of those. SPIR-V dialect already has CooperativeMatrix ops and types which are the exact equivalent to MMA. I think we should try to remove what is an overfit for NVVM like using vector type for mmafragment but otherwise this is the right direction in my opinion.

Mar 3 2021, 9:21 PM · Restricted Project

Feb 17 2021

navdeepkk updated the diff for D95334: [MLIR][CUDA-RUNNER] Add WMMA Tensor core matmul test.

Changes in this diff :-

Feb 17 2021, 8:57 AM · Restricted Project
navdeepkk updated the diff for D95331: [MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops.

Changes in this diff :-

Feb 17 2021, 8:54 AM · Restricted Project
navdeepkk updated the diff for D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

Changes in this diff:-

Feb 17 2021, 8:50 AM · Restricted Project

Feb 9 2021

navdeepkk added a comment to D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

Hi, Thanks for the comments.

A high-level design question: why does the element type of mmafragment have to be a vector type? I'd just use 2D indexing for the fragment, it's not like we are going to extract vectors from it.

I have tried to keep the types as close to what is expected by the corresponding LLVM intrinsics. As the mma.compute intrinsic expects operands in <2 x half> form and also returns things in a similar form, I have used the vector type.

This is an anti-argument for me, I see very little value in just lifting the low-level LLVM abstractions to higher levels. Hardcoding NVVM-specific modeling in the GPU dialect that is supposed to abstract that away defies the purpose of the GPU dialect. It sounds like memfragment<AxBxf16> would make all of the code, except for a tiny part of the conversion, simpler.

Feb 9 2021, 9:57 AM · Restricted Project
navdeepkk added a comment to D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

Hi, Thanks for the comments.

Feb 9 2021, 8:57 AM · Restricted Project

Feb 4 2021

navdeepkk retitled D95332: [MLIR][CUDA-RUNNER] Add CL options to pass SM version and index-bitwidth from [MLIR][CUDA-RUNNER] Add CL option to pass SM version to [MLIR][CUDA-RUNNER] Add CL options to pass SM version and index-bitwidth.
Feb 4 2021, 2:02 AM · Restricted Project
navdeepkk updated the diff for D95334: [MLIR][CUDA-RUNNER] Add WMMA Tensor core matmul test.

Changes in this diff :-

1.) Modify the test case to use the !gpu.mmafragment type introduced in 
  revision D95330.
Feb 4 2021, 12:06 AM · Restricted Project
navdeepkk updated the diff for D95332: [MLIR][CUDA-RUNNER] Add CL options to pass SM version and index-bitwidth.

Changes in this diff:-

1.) Add CL option to pass index-bitwidth for LLVM lowering passes
  on the device side.
Feb 4 2021, 12:04 AM · Restricted Project
navdeepkk updated the diff for D95331: [MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops.

Changes in this diff :-

1.) Rebase on master to drop the use of LLVMType.
2.) Make changes to use the !gpu.mmafragment type introduced in parent
  revision D95330.
Feb 4 2021, 12:00 AM · Restricted Project

Feb 3 2021

navdeepkk updated the diff for D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.

Issues in diff 318905 :-

1.) The design used memrefs to model mmafragments and were allocated in `.local`
  space in the PTX generated. This compeletely destroyed the purpose of wmmaOps(to
  use operands in `.reg` space.).
Feb 3 2021, 11:56 PM · Restricted Project

Jan 24 2021

navdeepkk added a reviewer for D95334: [MLIR][CUDA-RUNNER] Add WMMA Tensor core matmul test: herhut.
Jan 24 2021, 11:35 PM · Restricted Project
navdeepkk added a reviewer for D95333: [MLIR][NVVM] Add test cases to check translation of matrix-multiply accumulate ops to the corresponding intrinsics in NVPTX backend: herhut.
Jan 24 2021, 11:33 PM · Restricted Project
navdeepkk requested review of D95334: [MLIR][CUDA-RUNNER] Add WMMA Tensor core matmul test.
Jan 24 2021, 11:27 PM · Restricted Project
navdeepkk requested review of D95333: [MLIR][NVVM] Add test cases to check translation of matrix-multiply accumulate ops to the corresponding intrinsics in NVPTX backend.
Jan 24 2021, 11:24 PM · Restricted Project
navdeepkk requested review of D95332: [MLIR][CUDA-RUNNER] Add CL options to pass SM version and index-bitwidth.
Jan 24 2021, 11:21 PM · Restricted Project
navdeepkk requested review of D95331: [MLIR][GPU][NVVM] Add conversion of warp synchronous matrix-multiply accumulate GPU ops.
Jan 24 2021, 11:17 PM · Restricted Project
navdeepkk requested review of D95330: [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops.
Jan 24 2021, 11:10 PM · Restricted Project

Jan 8 2021

navdeepkk added a comment to D92233: [MLIR][Affine] Add affine.for normalization support.

Hey guys,
I found a problem with this pass; if you feed it the following loop:

func @main(%arg0: memref<10xf32>, %x: f32) {
  %c0 = constant 0 : index
  %c10 = constant 0 : index
  affine.for %i = affine_map<(d0) -> (d0)>(%c0) to affine_map<(d0) -> (d0)>(%c10) {
    affine.store %x, %arg0[%i] : memref<10xf32>
  }
  return
}

then it will be collapsed to a zero-iteration loop:

#map0 = affine_map<(d0, d1) -> (0)>
#map1 = affine_map<(d0, d1) -> (d1 + d0)>
module  {
  func @main(%arg0: memref<10xf32>, %arg1: f32) {
    %c0 = constant 0 : index
    %c0_0 = constant 0 : index
    affine.for %arg2 = 0 to #map0(%c0_0, %c0) {
      %0 = affine.apply #map1(%c0_0, %arg2)
      affine.store %arg1, %arg0[%0] : memref<10xf32>
    }
    return
  }
}

As a workaround it is possible to run canonicalization before normalization.
Should I report it somewhere like bugzilla (it doesn't seem to be very active though)?

Jan 8 2021, 10:00 PM · Restricted Project

Dec 5 2020

navdeepkk updated the diff for D92233: [MLIR][Affine] Add affine.for normalization support.

Address comments on diff 308060

Dec 5 2020, 10:07 PM · Restricted Project

Nov 27 2020

navdeepkk requested review of D92233: [MLIR][Affine] Add affine.for normalization support.
Nov 27 2020, 7:57 AM · Restricted Project

Sep 17 2020

navdeepkk updated the diff for D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.

Remove include directive "block.h" from "TestAffineLoopParametricTiling.cpp"

Sep 17 2020, 7:50 AM · Restricted Project
navdeepkk updated the diff for D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.

Address comments on diff(291964).

Sep 17 2020, 2:11 AM · Restricted Project

Sep 16 2020

navdeepkk updated the summary of D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.
Sep 16 2020, 8:10 AM · Restricted Project

Sep 15 2020

navdeepkk added a comment to D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.

constructTiledSetHyper... appears to have been moved but not mentioned in the commit summary as an NFC move. Has it been updated in some way?

Sep 15 2020, 10:03 PM · Restricted Project
navdeepkk updated the summary of D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.
Sep 15 2020, 10:02 AM · Restricted Project
navdeepkk updated the diff for D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.

Address comments on diff (291253).

Sep 15 2020, 10:00 AM · Restricted Project

Sep 11 2020

navdeepkk updated the summary of D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.
Sep 11 2020, 8:56 PM · Restricted Project
navdeepkk updated the diff for D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.

Remove auto from appropriate places in TestAffineLoopParametricTiling.cpp

Sep 11 2020, 9:41 AM · Restricted Project
navdeepkk updated the diff for D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.

Address comments on diff (290923).

Sep 11 2020, 6:26 AM · Restricted Project

Sep 10 2020

navdeepkk updated the diff for D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.

Address comments on diff (290734).

Sep 10 2020, 3:21 AM · Restricted Project

Sep 9 2020

navdeepkk updated the summary of D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.
Sep 9 2020, 7:05 AM · Restricted Project
navdeepkk updated the diff for D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.

Address comments on initial diff (290671)

Sep 9 2020, 7:01 AM · Restricted Project
navdeepkk edited reviewers for D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling, added: bondhugula; removed: nicolasvasilache.
Sep 9 2020, 2:01 AM · Restricted Project
navdeepkk requested review of D87353: [MLIR][Affine] Add parametric tile size support for affine.for tiling.
Sep 9 2020, 1:58 AM · Restricted Project