- User Since
- Jan 7 2020, 7:38 AM (5 w, 5 d)
Fri, Feb 14
Cool, great start!
Thu, Feb 13
Some more comments addressed.
The answer is simple. I did not rebase :)
Commit message fixed.
Noticed this because the loop.parallel transformation bailed out on these.
Wed, Feb 12
https://reviews.llvm.org/D74480 for the mlir-cuda-runner changes. I had to include many dependencies that I assume should come with the JitRunner. Not sure what the overall goal is but this makes things build.
include files for the win :-)
Tue, Feb 11
Feel free to remove the changes to log. But with your change it seems even less consistent than before.
Comments, comments, comments :-)
Some small comments. With changes syntax this looks great.
Writing this in code for discussion. Is this the way to resolve conflicts or do we want to hand-pick patterns from the LLVM conversion? The latter seems brittle as well, as the number of pattern keeps growing and we need to track. By disallowing, we can do this whenever we add a new intrinsic.
Explicit lambda instead of std::bind.
Mon, Feb 10
Thanks! This looks great. I had written some comments but forgot to hit the submit button it seems :(
Thanks for starting this. I'd be happy to see this land.
Thank you Christian for adding this!
Fri, Feb 7
Correct tests and some minor cleanup.
Thu, Feb 6
Also support linalg tiled loops
Wed, Feb 5
I am setting up a build bot that will build the tree with the cuda parts enabled so we can see this easier.
Great to see progress on this! Some comments.
https://reviews.llvm.org/D74041 should fix the link errors.
There is probably a prettier way to do this but this unblocks building.
Tue, Feb 4
@rriddle Is this what you had in mind?
Remove attribute from td file and minor cleanup.
Remove unneeded import.
Made it a template with help from @pifon2a.
Let's use dynamicallyIllegalOp instead to disallow the CPU intrinsic explicitly. That would also make the non-determinism go away by not relying on pattern order anymore. And it also captures the actual issue that the intrinsics call is indeed illegal.
This seems good to me but the tests are failing. From just looking at the code not sure why.
Mon, Feb 3
Maybe this should even be a template to cover more cases like BlockArguments.
I started implementing the lowering of parallel loops to gpu launch code. This code uses mapping annotations in the form of attributes. Currently, one can only map to a single block/thread id. Also, upper bounds for the launch can be defined based on the number of iterations of the mapped iteration of the parallel loop. This is probably not what we want in the long run but a good starting point to inform the design we take.
Looks good to me. I wonder about users of TiledLinAlgOp. Are there none? This should be a breaking change.
Fri, Jan 31
Whoohoo. \0/ Thanks!
I went for lifting the requirement that gpu.launch needs to be closed from above. So we no longer have any arguments.
Thu, Jan 30
Wed, Jan 29
PTAL. I will do another breaking change that removes the use of region arguments in a followup.
Tue, Jan 28
Split out gpu dialect cleanup parts.
Mon, Jan 27
Thank you for adding this!
Fri, Jan 24
Mon, Jan 20
Jan 15 2020
I have reviewed this before (the basic approach has not changed). Thanks for adding more comments and tests.
Jan 14 2020
Jan 13 2020
I browsed over the code and left some minor comments. The idea and code in verifiers LGTM. I find it much harder to read now with all those similar sounding accessors but that is inevitable I assume if you want to support mixed forms.