- User Since
- Oct 9 2015, 4:06 AM (206 w, 3 d)
Tue, Aug 27
Aug 7 2019
Hmm, those test changes were unexpected.
Aug 5 2019
Sure, why not.
Mostly seems reasonable to me, but two questions inline.
Jul 25 2019
This needs a test. It should be possible to extract one from the bug reports?
I do prefer the explicit isDivergentUse!
Jul 24 2019
LGTM. Followup for WQM/WWM sounds good to me as well.
Thanks for doing this. For the codegen quality question, I wonder if something like the following could be done:
Also fixes support for targets without i16.
Nice work! It's missing tests though -- I'm not sure if we want to print this additional information in the divergence printer, but definitely a test for the atomic optimizer pass is required.
Jul 23 2019
Based on a quick check this should be safe for all graphics clients. LGTM.
LGTM, though I believe the hyperlinks in :ref: need to be without the underscore (and other examples of :ref: use seem to corroborate that).
Just to clarify, how is selection of global_load_ubyte and friends going to work? I assume similar to today where the load returns an s32 value, but instruction selection does matching based on the MemOperand remembering the size?
Jul 22 2019
One more thought:
Okay, the possibility of an AssertZext is an interesting point. So let me try the other way around: What would the MIR at this stage look like to enforce an and?
Okay thanks, I see the logic now.
My understanding is that this is mostly related to CWSR. The trap handler has to be able to "replay" the GWS instruction.
The AMDGPU changes seem fine to me overall.
Thanks! Could you please also add a test to Analysis/ValueTracking/memory-dereferenceable.ll?
Jul 19 2019
How about the following simpler logic:
Have you checked that this actually fixes the reported CTS failure?
Jul 18 2019
But then following this logic, I still think that by analogy with G_ZEXT, the operation of COPY from s1 into vcc should have the semantics of ignoring the high bits of the "s1 which is really an s32". Since there's nothing in the MIR test which guarantees that the incoming high bits of $sgpr0 are 0, the resulting code needs to have some form of masking.
That seems reasonable to me.
Jul 17 2019
Jul 16 2019
Add missing test changes
Add test case and remove the legalizer part.
I'm taking this over.
This seems incorrect, doesn't it? The truncation disappeared.... (e.g., what if $sgpr0 is 0x10)
Oh, is that because the new node causes the intrinsics to be lowered to G_FMINNUM etc.? Why doesn't this affect any other targets?
Why are you removing the testcases for the intrinsics?
Yeah, let's take this.
Would obviously be good to fix the underlying issue, but sure, this seems reasonable.
Jul 12 2019
You should probably wait a bit in case somebody else wants to chime in, but this looks good to me.
Jul 2 2019
Jul 1 2019
Address review comments
I'm currently looking into it.
One nit, apart from that LGTM.
Would it be possible to make default operands overridable automatically iff they are at the end of the operand list? I.e., if you have a suffix of default operands, then those can be overridden?
Thanks for dealing with this. Matt's suggestion is reasonable to me, either way LGTM.
Jun 28 2019
Properly test based on wavefront size
Add a test case
Jun 27 2019
I'd prefer the alternative fix at D63871, since it doesn't require RPOT.
This does seem useful, although the description is overly narrow (what does nosync on its own have to do with freeing memory?).
Jun 26 2019
Oh... radv declares compute LDS as an LLVM global variable but doesn't use rtld yet, right? Sorry, I missed that.
Jun 25 2019
Trivial nitpick, but essentially LGTM.
Jun 24 2019