User Details
- User Since
- Apr 15 2016, 11:26 AM (362 w, 1 d)
Mar 22 2017
LGTM
Mar 15 2017
Mar 13 2017
Just a wild thought:-) Since address spaces and memory scopes are both target defined concepts, it seems best for the target to define the enumeration of the ones it supports. Would it make sense for the data layout to not only give the properties of the address spaces, but also give their textual names, together with the textual names of the memory scopes supported by the target? The memory scopes would include the memory scope to use for singlethread and crossthread.
Feb 22 2017
LGTM
Feb 21 2017
LGTM
Feb 17 2017
Feb 16 2017
LGTM
Feb 14 2017
Feb 9 2017
LGTM
Feb 8 2017
Feb 7 2017
Feb 2 2017
Jan 31 2017
Also need to use the trap handler ABI query to see if there is a trap handler, and if there is add the TRAP_HANDLER_SGPR_COUNT to the number of SGPRs budgeted for the wave in determining the number of waves per EU calculation. TRAP_HANDLER_SGPR_COUNT is 16 for GFX6 onwards.
Jan 28 2017
Jan 27 2017
Jan 26 2017
Jan 24 2017
Nov 8 2016
Nov 2 2016
Feedback on overall pass.
Oct 27 2016
Oct 26 2016
Oct 25 2016
LGTM
Oct 18 2016
LGTM
Oct 17 2016
Sep 21 2016
I would argue that even without (2) there is a case for using standard LLVM atomic instructions as it makes it possible to generate the same IR for a given language regardless of whether the target supports scopes. It seems CLANG is moving towards generating LLVM IR atomics directly rather than calls to built-ins so this approach would support that. Currently LLVM already supports two scopes. It also makes code generation much simpler as the existing machine instructions can be used and so avoid creating a large number of pseudo instructions. The patches D24577 and D24623 that are under review are pretty simple and do that. There would be a lot more code if intrinsics had to be used.
Jun 10 2016
The old R_AMDGPU_ABS32_LO and the new R_AMDGPU_ABS32 are in fact the same thing. The & 0xffffffff is implicitly done because the result of the R_AMDGPU_ABS32 is a word32 not a word64. So putting in the & 0xffffffff is redundant. The other ABI documents (such as for the x86) do not put in the & 0xffffffff so it seemed best to follow their conventions. R_AMDGPU_ABS32_HI and R_AMDGPU_ABS32 both return 32 bits as they are both defined as word32. They differ in that R_AMDGPU_ABS32_HI takes the address and shifts it right by 32 bits which effectively means that the top 32 bits of the 64 bit address are returned, not the bottom 32 bits.