User Details
- User Since
- Apr 4 2014, 4:14 AM (424 w, 12 h)
Today
Yesterday
Add [AMDGPU] to the title.
LGTM
Wed, May 18
LGTM
Tue, May 17
LGTM
Added comments about the values used.
Sorry for delay, thanks for doing this!
This is LGTM, except it needs a rebase: ISD::SCALAR_TO_VECTOR for Vec16 has changed since this was created.
Renamed 'Fast' used in the file to 'RelativeSpeed'.
Rebased.
Fixed rebase artifact with misplaced comment.
LGTM
Rebased.
LGTM with formatting fixed.
ping
LGTM
Code was submitted as a part of D124884. Test was submitted separately.
LGTM
Mon, May 16
Added comment about offset operand.
Changed asserts to cannot select.
Changed asserts to cannot select.
Fri, May 13
Struct version is now handled by the parent patch D124884 itself.
Switched to direct select which allows to use 2 separate memory operands.
The patch now handles both raw and struct intrinsics.
Return false from getMemOperandsWithOffsetWidth() instead of checking mem operand.
Return false from getMemOperandsWithOffsetWidth() instead of checking mem operand.
Thu, May 12
ping
Wed, May 11
Removed support for wide than DWORD ops. See D125409.
Removed support for wide than DWORD ops. See D125409.
Tue, May 10
I do not understand how may this work. The LDS address has the component TID * 4. I.e. neighboring lanes shall overwrite each other. I can only imagine a scenario with a very specific EXEC mask where it might be different.
Do not split voffset because inst_offset is applied to both VMEM and LDS address and voffset is not. Add a separate operand instead.
Same as in parent change D124884.
Do not split voffset because inst_offset is applied to both VMEM and LDS address and voffset is not. Add a separate operand instead.
Mon, May 9
Fri, May 6
One more thing to note: it is already incompatible with sp3 because we prohibit unused vdst, while sp3 enforces it.
A potentially better alternative is to use gfx940 names with _LDS_ in the mnemonic instead of a modifier. This is logically a different opcode anyway. The only downside it is not compatible with the documentation and sp3. But then it was not implemented before and therefore not used, so there shall be no compatibility problem on practice. Well, it will also be different from MUBUF. Given the difference in both semantics and addressing mode I personally would prefer it to be different opcodes. At a pseudo level it is certainly easier to have separate ops for this.
LGTM
As far as I understand this requires a whole program compilation and will not work with late linking?
Thu, May 5
Wed, May 4
- Removed the overload and added i32 %size operand instead.
- LDS pointer is i8 addrspace(3) now qualified with the address space.
- Rebased on the change to handle hazards between m0 initialization and these operations.
LGTM, but please wait for Austin too.
Tue, May 3
To confirm: is that OK to add yet another imm to the end of operands of the intrinsic to select a byte size? And then remove the overload. If yes I will do it tomorrow.
It will also need to be rebased on top of D124550 to handle hazard between M0 initialization and LDS DMA.