- User Since
- Oct 9 2015, 4:06 AM (118 w, 5 d)
Nov 28 2017
Nov 22 2017
I should add that apart from the one comment I had, the change looks good to me. It would be great if we could avoid those added custom SD nodes somehow, though...
Nov 21 2017
Nov 16 2017
Fine with me.
Oct 24 2017
Oct 23 2017
Can the same be achieved by implementing getTgtMemIntrinsic?
One minor comment inline.
One comment inline, apart from that LGTM. I imagine some of those additional spills could be improved with better register allocation, and that's a bullet someone has to bite eventually, but not now.
P.S.: It would help to use ReviewBoard's feature of defining dependent (parent/child) revisions, like I've just done now. It sucks that this is not done automatically, unfortunately I haven't found a good way to do proper reviews of series with it yet. (This is my mine gripe with all those shiny tools written by web hipsters...)
Some small comments inline. More generally, I think stuff like this belongs in a MIR-level InstCombine pass. We just don't have the infrastructure for that yet, and this patch is a nice improvement in the meantime.
That should indeed help a lot :)
I agree the naming of the intrinsics is a bit unfortunate, but it is what it is, and it's not the end of the world. Please remove speculatable. Other than that, LGTM.
You're right, on second thought the existing intrinsic has the same problem. So this is still a strict improvement.
Oct 11 2017
Oct 10 2017
A bunch of inline comments, but also some higher level things that don't really fit anywhere. This is a useful feature, but I don't think we've ever gotten the design of kill just right, because kill is really an implicit control flow intrinsic.
Apart from Matt's comments, this looks good to me.
Oct 2 2017
Mostly it took me a while to grok why the function was looking at MAX instructions. As far as I understand now, it's because the code that deduces the clamp output modifier happens to always produce it on a MAX(x,x) instruction, and not a MIN.
Sep 29 2017
Use alignDown instead of magic 4092.
I'm not too familiar with the asm printer stuff, but this all looks reasonable to me.
Maybe you could add a helpful comment on isClamp? Either way, LGTM.
Sep 14 2017
Updated commit message: the highest value that can be represented with
immediate + inline is 4156 = 4092 + 64.
Sep 6 2017
How about in unittests/ADT/TripleTest.cpp?
Sep 4 2017
TripleTest.cpp has some minimally meaningful tests though. So a test of amdgcn-amd-amdpal should be added there. Apart from that the patch looks good.
Aug 28 2017
Aug 8 2017
Aug 4 2017
Some comments on what I noticed. Probably best if somebody else has a look as well.
Aug 3 2017
Aug 2 2017
One comment, apart from that LGTM.
One remaining style nitpick. LGTM with that fixed.
Thanks. Assuming we agree on the derivative calculations logic I added in a comment, this LGTM.
Aug 1 2017
Jul 21 2017
Looks mostly good, but I do have a couple of comments.
Jul 20 2017
One question, apart from that looks good.
One minor comment. Either way, LGTM.
Some minor comments. In addition, I think it can be simplified, and we probably want the intrinsic to be convergent, because sinking WQM computations into a non-uniform branch could mean that the computation becomes non-WQM for practical purposes.
Jul 14 2017
This was submitted long time ago.
Jun 28 2017
So, one thing that's not clear to me with is the semantics of how the update.dpp intrinsic is supposed to enable WQM or WWM. In your sequence of instructions, if you just put a WQM/WWM flag on the update.dpp intrinsic, how does LLVM know whether the regular ALU intrinsics in between should run in WQM/WWM or not?
May 18 2017
May 8 2017
Low & high bits and use cases aside, isn't this something that could be covered by adding support for PC in the read_register intrinsic?
Apr 24 2017
Address review comments and apply some more formatting fixes.
Address review comments.
I don't see anything wrong with the code.
Apr 21 2017
Apr 19 2017
There might be a benefit to choosing a larger alignment, like 64 bytes, due to cache line alignment.
Apr 8 2017
When applied to current trunk, this code can get into an infinite loop. Please see the sample LLVM IR at https://paste.debian.net/926533/
Feb 15 2017
Feb 10 2017
Feb 1 2017
I haven't looked in too much detail yet. I assume getelementptr doesn't work with these pointers, so it would be good to have a negative test which ensures that GEP use fails.
I see the same problem as Tom here. Do those shaders use read-only SSBOs? If so, this could perhaps be done at the Mesa level. But even then, there'd be a problem if the same memory is bound to two different SSBOs, and one of them is written to, unless the SSBO is marked 'restrict'.
Jan 30 2017
Backing out the effectively dead FMF code, but leave the corresponding
FIXMEs in place.
Jan 27 2017
Yeah, I personally feel more comfortable having the exact same change on the branch.
Drop unnecessary getNode()s.
But you can't because they'll crash [...]
@arsenm, unfortunately I can't add flag-based tests, since as @hfinkel points out, you can't actually get the flag on an FMA or FMAD in the DAG. @hfinkel, the idea was to put the check in already anyway, since by the discussion with @spatel, the plan is to add FMF for all nodes eventually. That seems reasonable to me.
Formatting of FIXMEs
Jan 23 2017
Jan 20 2017
Added the DefaultFlags. It does fix @spatel's example for me. Anything else?
Jan 19 2017
Obviously it'd be even easier to skip the getFlags() thing entirely now, since it can never evaluate to true. But the right way forward is to extend support for FMF, so...
Add FIXMEs. @hfinkel, do I understand your comment correctly that this patch is
good to go?
Jan 17 2017
- Take individual UnsafeAlgebra flags into account.
Jan 13 2017
Thanks! We generally use CHECK-LABEL only for the start of a function, not for labels inside a function. The idea is that the label helps FileCheck get less confused when there are many sub-tests in a single test file.