Page MenuHomePhabricator

herhut (Stephan Herhut)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 7 2020, 7:38 AM (5 w, 5 d)

Recent Activity

Fri, Feb 14

herhut requested changes to D74544: [MLIR] Add naive fusion of parallel loops..

Cool, great start!

Fri, Feb 14, 4:17 AM · Restricted Project

Thu, Feb 13

herhut committed rG715783d415fe: [MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch. (authored by herhut).
[MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch.
Thu, Feb 13, 7:57 AM
herhut closed D73893: [MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch..
Thu, Feb 13, 7:57 AM · Restricted Project
herhut updated the diff for D73893: [MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch..

Some more comments addressed.

Thu, Feb 13, 6:08 AM · Restricted Project
herhut added inline comments to D73893: [MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch..
Thu, Feb 13, 6:08 AM · Restricted Project
herhut abandoned D74543: [MLIR][Affine] Mark affine.min and affine.max as NoSideffect..

The answer is simple. I did not rebase :)

Thu, Feb 13, 6:08 AM · Restricted Project
herhut updated the diff for D74543: [MLIR][Affine] Mark affine.min and affine.max as NoSideffect..

Commit message fixed.

Thu, Feb 13, 4:37 AM · Restricted Project
herhut added a reviewer for D74543: [MLIR][Affine] Mark affine.min and affine.max as NoSideffect.: ftynse.

Noticed this because the loop.parallel transformation bailed out on these.

Thu, Feb 13, 4:37 AM · Restricted Project
herhut created D74543: [MLIR][Affine] Mark affine.min and affine.max as NoSideffect..
Thu, Feb 13, 4:34 AM · Restricted Project

Wed, Feb 12

herhut added a comment to D74472: Fix MLIR build when the NVPTX target isn't configured.

https://reviews.llvm.org/D74480 for the mlir-cuda-runner changes. I had to include many dependencies that I assume should come with the JitRunner. Not sure what the overall goal is but this makes things build.

Wed, Feb 12, 6:14 AM · Restricted Project
herhut committed rG864110b5b499: [MLIR][CUDA] Fix build file for mlir-cuda-runner (authored by herhut).
[MLIR][CUDA] Fix build file for mlir-cuda-runner
Wed, Feb 12, 6:13 AM
herhut closed D74480: [MLIR][CUDA] Fix build file for mlir-cuda-runner.
Wed, Feb 12, 6:13 AM · Restricted Project
herhut added a reviewer for D74480: [MLIR][CUDA] Fix build file for mlir-cuda-runner: ftynse.

PTAL

Wed, Feb 12, 5:45 AM · Restricted Project
herhut created D74480: [MLIR][CUDA] Fix build file for mlir-cuda-runner.
Wed, Feb 12, 5:45 AM · Restricted Project
herhut added a reviewer for D74472: Fix MLIR build when the NVPTX target isn't configured: herhut.
Wed, Feb 12, 2:51 AM · Restricted Project
herhut accepted D74469: Add '#include <functional>` to PassManager.h..

include files for the win :-)

Wed, Feb 12, 2:33 AM · Restricted Project

Tue, Feb 11

herhut added a reviewer for D74406: Add RsqrtOp to LLVM dialect.: ftynse.

Feel free to remove the changes to log. But with your change it seems even less consistent than before.

Tue, Feb 11, 7:05 AM · Restricted Project
herhut committed rG890d5e2dd232: [MLIR][GPU] Disallow llvm tanh intrinsics when lowering to NVVM/ROCm. (authored by herhut).
[MLIR][GPU] Disallow llvm tanh intrinsics when lowering to NVVM/ROCm.
Tue, Feb 11, 6:19 AM
herhut closed D74389: [MLIR][GPU] Disallow llvm tanh intrinsics when lowering to NVVM/ROCm..
Tue, Feb 11, 6:19 AM · Restricted Project
herhut added a comment to D73893: [MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch..

(Sorry for jumping on this late)

This is now adding three different lowering of loops to GPU

  1. From perfectly nested loops to GPU
  2. From imperfectly nested loops to GPU
  3. Parallel loops to GPU I confess I contributed to one of them (2). We should probably have a plan to consolidate these. This is too much duplication of functionality. We should probably deprecate (2) in favor of this change, and adapt all uses accordingly.
Tue, Feb 11, 6:19 AM · Restricted Project
herhut updated the diff for D73893: [MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch..

Comments, comments, comments :-)

Tue, Feb 11, 5:43 AM · Restricted Project
herhut added a comment to D74288: [MLIR][Affine] Add affine.parallel op.

Some small comments. With changes syntax this looks great.

Tue, Feb 11, 2:45 AM · Restricted Project
herhut updated the diff for D74389: [MLIR][GPU] Disallow llvm tanh intrinsics when lowering to NVVM/ROCm..

Drop llvm.

Tue, Feb 11, 2:22 AM · Restricted Project
herhut added a comment to D74389: [MLIR][GPU] Disallow llvm tanh intrinsics when lowering to NVVM/ROCm..

This would work for me. At scale, we may need to analyze effects of such approach on compile time. The converter essentially twice the number of patterns it has to search through, and there are potential rollbacks of rewrites that generated illegal operations.

In general, I think we want to expose individual LLVM patterns anyway because several people hit the selection problem repeatedly. I haven't had time to do that yet.

Tue, Feb 11, 2:22 AM · Restricted Project
herhut added a reviewer for D74389: [MLIR][GPU] Disallow llvm tanh intrinsics when lowering to NVVM/ROCm.: ftynse.

Writing this in code for discussion. Is this the way to resolve conflicts or do we want to hand-pick patterns from the LLVM conversion? The latter seems brittle as well, as the number of pattern keeps growing and we need to track. By disallowing, we can do this whenever we add a new intrinsic.

Tue, Feb 11, 1:35 AM · Restricted Project
herhut updated the diff for D74389: [MLIR][GPU] Disallow llvm tanh intrinsics when lowering to NVVM/ROCm..

Explicit lambda instead of std::bind.

Tue, Feb 11, 1:35 AM · Restricted Project
herhut created D74389: [MLIR][GPU] Disallow llvm tanh intrinsics when lowering to NVVM/ROCm..
Tue, Feb 11, 1:35 AM · Restricted Project

Mon, Feb 10

herhut added inline comments to D74270: [mlir][GPUToSPIRV] Modify the lowering of gpu.block_dim to be consistent with Vulkan SPEC.
Mon, Feb 10, 5:03 AM · Restricted Project
herhut added a comment to D74174: [MLIR] Allow Loop dialect IfOp and ForOp to define values .

Thanks! This looks great. I had written some comments but forgot to hit the submit button it seems :(

Mon, Feb 10, 1:26 AM · Restricted Project
herhut accepted D73944: [mlir][wip] Start Shape dialect.

Thanks for starting this. I'd be happy to see this land.

Mon, Feb 10, 1:26 AM · Restricted Project
herhut added a comment to D74200: config for mlir-nvidia buildbot.

Thank you Christian for adding this!

Mon, Feb 10, 12:50 AM

Fri, Feb 7

herhut updated the diff for D73893: [MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch..

Correct tests and some minor cleanup.

Fri, Feb 7, 4:34 AM · Restricted Project

Thu, Feb 6

herhut updated the diff for D73893: [MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch..

Also support linalg tiled loops

Thu, Feb 6, 10:20 AM · Restricted Project
herhut added a comment to D74012: [mlir][spirv] Use spv.entry_point_abi in GPU to SPIR-V conversions.

I understand what the intent is here, but the input already has an attribute that belongs to the SPIR-V dialect before lowering.

Generally I think it is inevitable that we'll have attributes belonging to lower layers attached at the source input at a higher layer, for example we will have spv.target_env and spv.interface_var_abi attributes attached to dispatch regions isolated by IREE at HLO level. The root problem is that we cannot infer everything a lower dialect needs all from higher dialects and it does not make sense to create duplicates all the way up to every layer. But with that said,

That makes things a bit non-composable. In cases where someone lowers to the GPU dialect and then conditionally decides to lower to SPIR-V dialect or the NVVM dialect, with this change on the SPIR-V side a separate pass will be needed to add this attribute. Ideally the input should be only in GPU dialect, whereas here it isnt.

Here we are at the boundary between GPU dialect and SPIR-V dialect; so it should be fine to have SPIR-V specific stuff attached to the input to drive conversions towards SPIR-V. But I get your idea here regarding non-composability. If the input is, say, loops, it's better to have a proper GPU dialect attribute instead of spv.entry_point_abi attached to loops for driving further conversion.

Is it possible instead to add an attribute to GPU dialect itself which contains information about the workgroup size. Then while lowering we can convert one attribute to another.

Yeah that makes sense to me. At GPU level we also have such concepts so we can have similar attributes; they are just contracts to different layers. Right now we are using a bunch of command-line options for doing that job; I'd love to see us switching to use attributes there too. I've created https://llvm.discourse.group/t/using-attributes-to-specify-workgroup-configuration-when-lowering-to-gpu/496 to RFC. I view that as an upper layer above SPIR-V so it's a bit separated from the changes here IMHO.

Thu, Feb 6, 2:43 AM · Restricted Project
Herald added a reviewer for D74041: [MLIR][GPU] Fix build files for mlir-opt.: mravishankar.

I am setting up a build bot that will build the tree with the cuda parts enabled so we can see this easier.

I updated our build kite bot accordingly already: https://buildkite.com/mlir/mlir-core

Thu, Feb 6, 1:23 AM · Restricted Project

Wed, Feb 5

herhut added a comment to D74041: [MLIR][GPU] Fix build files for mlir-opt..

I am setting up a build bot that will build the tree with the cuda parts enabled so we can see this easier.

Wed, Feb 5, 9:27 AM · Restricted Project
herhut added a comment to D73944: [mlir][wip] Start Shape dialect.

Great to see progress on this! Some comments.

Wed, Feb 5, 9:18 AM · Restricted Project
herhut added a comment to D73653: [MLIR] Fixes for shared library dependencies..

https://reviews.llvm.org/D74041 should fix the link errors.

Wed, Feb 5, 6:57 AM · Restricted Project
herhut committed rGe1e09f0ce6d4: [MLIR] Add mapping based on ValueRange to BlockAndValueMapper. (authored by herhut).
[MLIR] Add mapping based on ValueRange to BlockAndValueMapper.
Wed, Feb 5, 6:54 AM
herhut closed D73894: [MLIR] Add mapping based on ValueRange to BlockAndValueMapper..
Wed, Feb 5, 6:54 AM · Restricted Project
herhut updated the diff for D73894: [MLIR] Add mapping based on ValueRange to BlockAndValueMapper..

use is_assignable

Wed, Feb 5, 5:41 AM · Restricted Project
herhut added reviewers for D74041: [MLIR][GPU] Fix build files for mlir-opt.: stephenneuendorffer, ftynse.

There is probably a prettier way to do this but this unblocks building.

Wed, Feb 5, 4:55 AM · Restricted Project
herhut created D74041: [MLIR][GPU] Fix build files for mlir-opt..
Wed, Feb 5, 4:55 AM · Restricted Project
herhut added a comment to D73713: Fixed non-deterministic GPU intrinsic lowering..

Let's use dynamicallyIllegalOp instead to disallow the CPU intrinsic explicitly. That would also make the non-determinism go away by not relying on pattern order anymore. And it also captures the actual issue that the intrinsics call is indeed illegal.

IIUC: This does not address the underlying issue with the framework though, it seems to still be in the category of "hiding the bug".

Wed, Feb 5, 1:44 AM · Restricted Project

Tue, Feb 4

herhut updated the diff for D73894: [MLIR] Add mapping based on ValueRange to BlockAndValueMapper..

Formatting.

Tue, Feb 4, 10:30 AM · Restricted Project
herhut added a comment to D73894: [MLIR] Add mapping based on ValueRange to BlockAndValueMapper..

@rriddle Is this what you had in mind?

Tue, Feb 4, 9:26 AM · Restricted Project
herhut updated the diff for D73894: [MLIR] Add mapping based on ValueRange to BlockAndValueMapper..

Less templates.

Tue, Feb 4, 9:17 AM · Restricted Project
herhut updated the diff for D73893: [MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch..

Remove attribute from td file and minor cleanup.

Tue, Feb 4, 4:52 AM · Restricted Project
herhut added a comment to D73893: [MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch..

I suppose you want high-level feedback on this, so I'm not nitpicking in the code.

Tue, Feb 4, 4:52 AM · Restricted Project
herhut updated the diff for D73894: [MLIR] Add mapping based on ValueRange to BlockAndValueMapper..

Remove unneeded import.

Tue, Feb 4, 4:25 AM · Restricted Project
herhut updated the diff for D73894: [MLIR] Add mapping based on ValueRange to BlockAndValueMapper..

Made it a template with help from @pifon2a.

Tue, Feb 4, 4:16 AM · Restricted Project
herhut added a comment to D73893: [MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch..
  1. one of the problems you solve here is to go from type (the affine maps for mapping) to values (tidx, bidy, etc)). This is a recurrent transition that would be great to try and make retargetable (e.g. OpenMP or other runtimes), I would love to see this separated from GPU. This would also force separating concerns a bit more re outlining of kernel vs mapping of loops.
Tue, Feb 4, 2:25 AM · Restricted Project
herhut added a comment to D73713: Fixed non-deterministic GPU intrinsic lowering..

Let's use dynamicallyIllegalOp instead to disallow the CPU intrinsic explicitly. That would also make the non-determinism go away by not relying on pattern order anymore. And it also captures the actual issue that the intrinsics call is indeed illegal.

Tue, Feb 4, 2:20 AM · Restricted Project
herhut added a comment to D73794: [MLIR] Remove all-reduce lowering from GPU to NVVM. Use in-dialect lowering instead..

This seems good to me but the tests are failing. From just looking at the code not sure why.

Tue, Feb 4, 2:06 AM · Restricted Project

Mon, Feb 3

herhut created D73894: [MLIR] Add mapping based on ValueRange to BlockAndValueMapper..
Mon, Feb 3, 7:43 AM · Restricted Project
herhut added a comment to D73894: [MLIR] Add mapping based on ValueRange to BlockAndValueMapper..

Maybe this should even be a template to cover more cases like BlockArguments.

Mon, Feb 3, 7:43 AM · Restricted Project
herhut added a reviewer for D73893: [MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch.: ftynse.

I started implementing the lowering of parallel loops to gpu launch code. This code uses mapping annotations in the form of attributes. Currently, one can only map to a single block/thread id. Also, upper bounds for the launch can be defined based on the number of iterations of the mapped iteration of the parallel loop. This is probably not what we want in the long run but a good starting point to inform the design we take.

Mon, Feb 3, 7:43 AM · Restricted Project
herhut created D73893: [MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch..
Mon, Feb 3, 7:24 AM · Restricted Project
herhut accepted D73872: [MLIR][Linalg] Use GenericLoopNestRangeBuilder in tiling code..

Looks good to me. I wonder about users of TiledLinAlgOp. Are there none? This should be a breaking change.

Mon, Feb 3, 4:52 AM · Restricted Project
herhut committed rG283b5e733d1b: [MLIR] Make gpu.launch implicitly capture uses of values defined above. (authored by herhut).
[MLIR] Make gpu.launch implicitly capture uses of values defined above.
Mon, Feb 3, 1:23 AM
herhut closed D73769: Make gpu.launch implicitly capture uses of values defined above..
Mon, Feb 3, 1:23 AM · Restricted Project
herhut updated the diff for D73769: Make gpu.launch implicitly capture uses of values defined above..

Review comments.

Mon, Feb 3, 1:14 AM · Restricted Project

Fri, Jan 31

herhut added a comment to D73620: Add 'gpu.terminator' operation..

Why was this committed while we're discussing here?

Fri, Jan 31, 3:44 AM · Restricted Project
herhut committed rG84695dd4d788: Fix conversion of loops to GPU with no block/thread dimensions. (authored by herhut).
Fix conversion of loops to GPU with no block/thread dimensions.
Fri, Jan 31, 2:16 AM
herhut closed D73685: Fix conversion of loops to GPU with no block/thread dimensions..
Fri, Jan 31, 2:16 AM · Restricted Project
herhut accepted D73684: [MLIR][Linalg] Lower linalg.generic to ploops..

Whoohoo. \0/ Thanks!

Fri, Jan 31, 2:15 AM · Restricted Project
herhut abandoned D73465: Add gpu::LaunchOp::addKernelArgument..

I went for lifting the requirement that gpu.launch needs to be closed from above. So we no longer have any arguments.

Fri, Jan 31, 1:38 AM · Restricted Project
herhut added a reviewer for D73769: Make gpu.launch implicitly capture uses of values defined above.: ftynse.
Fri, Jan 31, 1:37 AM · Restricted Project
herhut created D73769: Make gpu.launch implicitly capture uses of values defined above..
Fri, Jan 31, 1:37 AM · Restricted Project

Thu, Jan 30

herhut committed rG26927518955d: Add 'gpu.terminator' operation. (authored by herhut).
Add 'gpu.terminator' operation.
Thu, Jan 30, 3:52 AM
herhut closed D73620: Add 'gpu.terminator' operation..
Thu, Jan 30, 3:51 AM · Restricted Project
herhut added inline comments to D73685: Fix conversion of loops to GPU with no block/thread dimensions..
Thu, Jan 30, 3:38 AM · Restricted Project
herhut accepted D73689: [Linalg] Format Linalg/fusion.mlir..

Thanks!

Thu, Jan 30, 3:36 AM · Restricted Project
herhut edited reviewers for D73685: Fix conversion of loops to GPU with no block/thread dimensions., added: ftynse; removed: nicolasvasilache.
Thu, Jan 30, 2:39 AM · Restricted Project
herhut created D73685: Fix conversion of loops to GPU with no block/thread dimensions..
Thu, Jan 30, 2:38 AM · Restricted Project

Wed, Jan 29

herhut added inline comments to D73620: Add 'gpu.terminator' operation..
Wed, Jan 29, 6:08 AM · Restricted Project
herhut added a reviewer for D73620: Add 'gpu.terminator' operation.: ftynse.

PTAL. I will do another breaking change that removes the use of region arguments in a followup.

Wed, Jan 29, 5:04 AM · Restricted Project
herhut created D73620: Add 'gpu.terminator' operation..
Wed, Jan 29, 5:04 AM · Restricted Project

Tue, Jan 28

herhut updated the diff for D73465: Add gpu::LaunchOp::addKernelArgument..

Split out gpu dialect cleanup parts.

Tue, Jan 28, 3:18 AM · Restricted Project
herhut added inline comments to D73465: Add gpu::LaunchOp::addKernelArgument..
Tue, Jan 28, 3:18 AM · Restricted Project
herhut added a comment to D73465: Add gpu::LaunchOp::addKernelArgument..

Code motion into a gpu::LaunchOp region requires operands of moved instructions to be threaded through operands of the gpu.launch to ensure that the gpu.launch remaines closed from above.

Can we revisit lifting this requirement? This was something that was raised from the beginning of the development of this op.

Tue, Jan 28, 3:09 AM · Restricted Project
herhut committed rGfdcecefe30d8: Add lowering for loop.parallel to cfg. (authored by herhut).
Add lowering for loop.parallel to cfg.
Tue, Jan 28, 3:00 AM
herhut closed D73348: Add lowering for loop.parallel to cfg..
Tue, Jan 28, 3:00 AM · Restricted Project
herhut accepted D73535: Changed wrong ROCDL instructions in GPU lowering..

Nice catch!

Tue, Jan 28, 2:24 AM · Restricted Project

Mon, Jan 27

herhut added inline comments to D73465: Add gpu::LaunchOp::addKernelArgument..
Mon, Jan 27, 11:16 AM · Restricted Project
herhut accepted D73471: Add tanh lowering from Standard dialect to NVVM and ROCDL..

Thank you for adding this!

Mon, Jan 27, 11:16 AM · Restricted Project
herhut added inline comments to D73348: Add lowering for loop.parallel to cfg..
Mon, Jan 27, 6:33 AM · Restricted Project
herhut updated the diff for D73348: Add lowering for loop.parallel to cfg..

Review comments...

Mon, Jan 27, 6:31 AM · Restricted Project
herhut created D73465: Add gpu::LaunchOp::addKernelArgument..
Mon, Jan 27, 5:35 AM · Restricted Project

Fri, Jan 24

herhut added a reviewer for D73348: Add lowering for loop.parallel to cfg.: ftynse.
Fri, Jan 24, 6:22 AM · Restricted Project
herhut created D73348: Add lowering for loop.parallel to cfg..
Fri, Jan 24, 6:22 AM · Restricted Project

Mon, Jan 20

herhut accepted D72921: Create a gpu.module operation for the GPU Dialect..

Thanks.

Mon, Jan 20, 2:25 AM · Restricted Project
herhut removed a child revision for D72336: [mlir] Create a gpu.module operation for the GPU Dialect.: D72921: Create a gpu.module operation for the GPU Dialect..
Mon, Jan 20, 2:07 AM · Restricted Project
herhut removed a parent revision for D72921: Create a gpu.module operation for the GPU Dialect.: D72336: [mlir] Create a gpu.module operation for the GPU Dialect..
Mon, Jan 20, 2:07 AM · Restricted Project
herhut added a parent revision for D72921: Create a gpu.module operation for the GPU Dialect.: D72336: [mlir] Create a gpu.module operation for the GPU Dialect..
Mon, Jan 20, 2:07 AM · Restricted Project
herhut added a child revision for D72336: [mlir] Create a gpu.module operation for the GPU Dialect.: D72921: Create a gpu.module operation for the GPU Dialect..
Mon, Jan 20, 2:07 AM · Restricted Project

Jan 15 2020

herhut accepted D72129: [mlir] Add in-dialect lowering of gpu.all_reduce..

I have reviewed this before (the basic approach has not changed). Thanks for adding more comments and tests.

Jan 15 2020, 3:05 AM · Restricted Project

Jan 14 2020

herhut committed rG4624a1e8ac8a: [mlir] Create a gpu.module operation for the GPU Dialect. (authored by Tres Popp <tpopp@google.com>).
[mlir] Create a gpu.module operation for the GPU Dialect.
Jan 14 2020, 3:06 AM
herhut closed D72336: [mlir] Create a gpu.module operation for the GPU Dialect..
Jan 14 2020, 3:06 AM · Restricted Project

Jan 13 2020

herhut added a comment to D72555: [mlir][Linalg] Update the semantics, verifier and test for Linalg with tensors..

I browsed over the code and left some minor comments. The idea and code in verifiers LGTM. I find it much harder to read now with all those similar sounding accessors but that is inevitable I assume if you want to support mixed forms.

Jan 13 2020, 7:35 AM · Restricted Project