User Details
- User Since
- Nov 25 2019, 10:10 AM (130 w, 4 d)
Wed, May 25
Clean up namespacing.
Rebase.
Thanks Artem. I think I should be able to land it myself.
Mon, May 23
Comment fix.
Sun, May 22
Sat, May 14
Fri, May 6
Mon, May 2
Looks good from my side but please address Uday's comments.
Apr 26 2022
Apr 22 2022
Here is a repro:
Apr 21 2022
Apr 12 2022
async with 0 async dep tokens wouldn't appear to be a meaningful configuration for the op. The lowering does check for it and fails, but should it just be disallowed?
I'm not sure. I would say the semantics are clear if an op uses async (the host does not wait for the op to complete) but no dependencies (it can run immediately without waiting for anything else), and it's OK for the current lowering to be limited in what it can handle and rely on gpu-async-region to bring it into lowering-compatible form. I kind of like the symmetry of these ops (including gpu.wait, where gpu.wait async [] needs to be valid).
The CUDA_VERSION comes from the cuda.h header file.
Apr 11 2022
The gpu-async-region pass simply chains together sequences of gpu ops, with the intention of using async.execute to separate independent work that runs on separate streams. For that case, gpu ops can be synchronous during lowering from higher dialects because the async.execute regions specify which gpu ops should run in sequence and which ones can run in parallel.
Not opposed to this change at all, but what's the motivation for allowing gpu-async-region to run before gpu-kernel-outlining?
This should be sufficient:
Apr 10 2022
This API was introduced in CUDA 11.2 (December 2020).
Should we #if CUDA_VERSION >= 11020 around this?
Mar 16 2022
Thanks for taking care of this!
Mar 15 2022
Mar 14 2022
I think the gpu.wait canonicalizer could be cleaning up more cases:
Mar 13 2022
Mar 9 2022
Mar 8 2022
Fix mlir/lib/CAPI/IR/IR.cpp as well.
Mar 7 2022
Rebase.
Rebase.
Fix.
Fix.
Fix.
And again, more fixes.
More fixes.
Rebase.
Add fixes for other file moves.
Rebase.
Mar 6 2022
Feb 2 2022
We have settled on marking addf/mulf commutative as well (D118600) instead of removing it from minf/maxf. Abandoning this revision.
Jan 31 2022
Rebase.
Update tests.
Rebase.
Simplify 1.0 matcher, add test for minf(inf, %x) -> %x.
Also fold minf(%x, +inf) -> %x, fix test.
Rebase.
Jan 28 2022
Rebase.
Jan 27 2022
I understand all this as "keeping commutative is fine" isn't it?
Add comment.
Rebase.
So it seems that you're assuming that one of the input value is returned when there is a NaN, and it would consistently be based on its position in the argument list. Is this how we define it?
Another definition could be that payload isn't guaranteed to be carried over and that we return a non-specific NaN that can be different from either of the argument.
Jan 26 2022
Rebase
I would even say that it deserve a separate revision from adding folders on simple arithmetic.
Addressing reviewer comments:
Adding float value matchers and using the same pattern for int value matchers for consistency.