- User Since
- Oct 9 2015, 4:06 AM (197 w, 3 d)
Okay, the possibility of an AssertZext is an interesting point. So let me try the other way around: What would the MIR at this stage look like to enforce an and?
Okay thanks, I see the logic now.
My understanding is that this is mostly related to CWSR. The trap handler has to be able to "replay" the GWS instruction.
The AMDGPU changes seem fine to me overall.
Thanks! Could you please also add a test to Analysis/ValueTracking/memory-dereferenceable.ll?
Fri, Jul 19
How about the following simpler logic:
Have you checked that this actually fixes the reported CTS failure?
Thu, Jul 18
But then following this logic, I still think that by analogy with G_ZEXT, the operation of COPY from s1 into vcc should have the semantics of ignoring the high bits of the "s1 which is really an s32". Since there's nothing in the MIR test which guarantees that the incoming high bits of $sgpr0 are 0, the resulting code needs to have some form of masking.
That seems reasonable to me.
Wed, Jul 17
Tue, Jul 16
Add missing test changes
Add test case and remove the legalizer part.
I'm taking this over.
This seems incorrect, doesn't it? The truncation disappeared.... (e.g., what if $sgpr0 is 0x10)
Oh, is that because the new node causes the intrinsics to be lowered to G_FMINNUM etc.? Why doesn't this affect any other targets?
Why are you removing the testcases for the intrinsics?
Yeah, let's take this.
Would obviously be good to fix the underlying issue, but sure, this seems reasonable.
Fri, Jul 12
You should probably wait a bit in case somebody else wants to chime in, but this looks good to me.
Tue, Jul 2
Mon, Jul 1
Address review comments
I'm currently looking into it.
One nit, apart from that LGTM.
Would it be possible to make default operands overridable automatically iff they are at the end of the operand list? I.e., if you have a suffix of default operands, then those can be overridden?
Thanks for dealing with this. Matt's suggestion is reasonable to me, either way LGTM.
Fri, Jun 28
Properly test based on wavefront size
Add a test case
Thu, Jun 27
I'd prefer the alternative fix at D63871, since it doesn't require RPOT.
This does seem useful, although the description is overly narrow (what does nosync on its own have to do with freeing memory?).
Wed, Jun 26
Oh... radv declares compute LDS as an LLVM global variable but doesn't use rtld yet, right? Sorry, I missed that.
Tue, Jun 25
Trivial nitpick, but essentially LGTM.
Mon, Jun 24
Jun 17 2019
- region pointers cannot appear as flat pointers
- expand the tests, handle the cmpxchg case
Address review comment
Good idea, but needs a fix I think.
I should clarify that I'm not opposed to this kind of change in the long run, but as-is it needs a careful look at the performance implications.
What does this actually fix?
I don't quite understand why you think this can't be writeonly? (The flip side is, I don't see how having this writeonly would help codegen)
Jun 16 2019
Wrap LDS TargetGlobalAddress nodes in an AMDGPUISD::LDS node that is
selected to S_MOV/V_MOV. This should address the concerns about having
more machine nodes during SelectionDAG, and as a nice side bonus it
actually improves the quality of known-bits information by adding
alignment information as well.
Rebase, address some of the comments.
Significant update to this patch to make .amdgpu_lds work more
like common symbols:
Thanks, this is very helpful!