User Details
- User Since
- Mar 27 2014, 11:59 PM (496 w, 21 h)
Nov 30 2016
Aug 22 2016
LGTM after you address David's comment -- A post-dominates B doesn't guarantee execution that reaches B will arrive A.
Aug 18 2016
Yay! Thanks for testing this!
Jul 15 2016
Jul 11 2016
Jul 9 2016
Jul 8 2016
Jun 24 2016
Jun 15 2016
May 25 2016
Have you considered letting Clang (instead of a late-stage IR pass) add these ranges? These ranges are useful for some target-independent IR passes, e.g. those using ValueTracking (D4150).
May 4 2016
Apr 29 2016
Apr 28 2016
Looks great! Thanks.
Thanks! I completely missed this case :(
Awesome! Thanks Justin.
Apr 26 2016
Oh, I see what you mean now.
Hi Andrew,
Maybe change the title to be "... instead of default for pure arithmetic instructions". Otherwise, LGTM!
Apr 12 2016
Thanks!
Apr 4 2016
Should we land this? It will fix PR26185.
Mar 31 2016
Other than that FIXME, pretty straightforward. LGTM
Mar 29 2016
Mar 22 2016
D18168 duplicates this and is submitted.
Thanks for working on this long overdue feature!
Mar 20 2016
All comments addressed. Submitting...
Some more minor changes.
Mar 18 2016
jlebar@'s comments
Mar 16 2016
Thanks a lot for the review! I understand it's a lot of work :) I answered one of your high-level questions. I'm still OOO and will get to other comments later.
Mar 15 2016
I am not clear with this part either. jholewinski@, can you comment on this?
Sorry, I pressed the wrong button. I meant to say "needs revision". Feel free to reclaim this patch.
Pressed the wrong button. Meant to say "needs revision". Feel free to reclaim it.
Oops... I pressed the wrong button. I meant to say "need revision". jmolly@, feel free to reclaim it.
It doesn't work with the new alias analysis infrastructure.
Is this patch obsolete? Are you still trying to push it in?
Mar 11 2016
I am OOO and maybe unable to review this until next week.
Mar 8 2016
Mar 3 2016
Feb 23 2016
Feb 20 2016
Feb 16 2016
Feb 12 2016
Lgtm
Feb 11 2016
I'll defer to Justin's approval.
Feb 9 2016
Just a reminder if you haven't done that already, double-check how the web page looks like before you commit.
Feb 8 2016
Feb 5 2016
I was referring to this paragraph
Barriers are executed on a per-warp basis as if all the threads in a warp are active. Thus, if any thread in a warp executes a bar instruction, it is as if all the threads in the warp have executed the bar instruction. All threads in the warp are stalled until the barrier completes, and the arrival count for the barrier is incremented by the warp size (not the number of active threads in the warp). In conditionally executed code, a bar instruction should only be used if it is known that all threads evaluate the condition identically (the warp does not diverge). Since barriers are executed on a per-warp basis, the optional thread count must be a multiple of the warp size.
Read more at: http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#ixzz3zMFDtpKf
LGTM, but do you have a test where LLVM generates wrong code if __syncthreads is not marked convergent?
Committed in r254408.
Feb 4 2016
Feb 3 2016
More comments.
Jan 30 2016
Jan 22 2016
Dec 18 2015
Dec 17 2015
Nov 29 2015
LGTM with some minors
Nov 25 2015
Nov 18 2015
Nov 17 2015
Nov 10 2015
Replace the link to the raw diff with more instructions.
Simplify the command lines and header file inclusion
Nov 6 2015
simplify the doc
added cuda_runtime.h
I'll let you do that after this patch. You know much better than me on
those options.