User Details
- User Since
- Jan 30 2014, 6:27 AM (439 w, 1 d)
Wed, Jun 29
Tue, Jun 28
Where is the discussion and rational for this?
LG. one nit.
Mon, Jun 27
I'm not so sure but phab things this is small, so no complaints:
LG
Fri, Jun 24
LG
Thu, Jun 23
Wed, Jun 22
Do we want JIT -> YES, but specalizing LLVM-IR JIT.
Do we want/need PTX, I do not, but I don't mind having it. Someone will ask for it eventually.
Tue, Jun 21
^^^ so much for stored once
I think I misread it earlier. Constant & !thread_dependent *or* nosync. This makes more sense.
LG, I think :)
This breaks in the presence of synchronization.
Tue, Jun 14
FWIW, I think we should have these attributes as spelled here, just w/ proper pass manager integration which then requires an RFC.
That said, I'm not opposed to this as an incremental step, albeit confusing until the PM support is integrated if we allow O1/2/3/fast
Mon, Jun 13
Thu, Jun 9
Tue, Jun 7
This seems to be uncontroversial. LGTM
May 27 2022
May 25 2022
Cool! Can we have that for begin declare variant too :D ?
May 20 2022
May 19 2022
LGTM
May 18 2022
May 16 2022
LG
Typos in the commit message.
s/new Clang/matching Clang/ ?
LG
May 13 2022
LG, nice!
May 12 2022
LG
May 11 2022
May 10 2022
LG
LG
I think we are lacking a good way to do what I want, let's go ahead with forward declares for now.
I'm not super happy about the forward decls. Should we augment begin declare tatrget with begin declare variant match(device={kind(nohost)}) instead?
LG
Not actually tested, is it?
Conceptually fine with me, @tra?
May 6 2022
I though this is illegal to use align 4 loads for align 32 arguments.
Are there unresolved concerns we should address in this review?
Typo(s) in the commit message, otherwise this seems reasonable. LG
May 5 2022
I think this is fine, we can argue about the name but it won't matter much. A test would be nice though.
May 2 2022
Apr 29 2022
LG, FWIW, you can commit these right away.
LG
Apr 27 2022
For Ex: SimplifyCFG pass removes the branch leading to a BB which has an incoming value that will always trigger undefined behavior. This basically modifies the CFG and combines the basic blocks. This works for CPU execution. But on a GPU, there are intrinsics like "__shfl_sync(unsigned mask, T var, int srcLane, int width=warpSize)", Where the exchange of variable occurs simultaneously for all active threads within the warp. So, here in the cuda/hip kernel, variable var in shuffl_sync may not be initialised, and LLVM IR treats it as undef. Currently all the arguments are tagged with noundef attribute and the above mentioned optimization by SimplifyCFG gets applied and the kernel execution becomes ambiguous. So, the proposed change is to skip adding noundef attribute to arguments when a function has been tagged with convergent attribute.
Apr 21 2022
LG
Apr 19 2022
Finally, I'd suggest to avoid the undef and UB.Both argument promotion and lining are legal behavior. I think there's no undef/UB for single pass here.
Apr 18 2022
Apr 17 2022
Use set instead of map
Fix rebase issues, address comments
Fix rebase issues
Fix rebase errors, address comments
Address comments
Apr 15 2022
LG