Page MenuHomePhabricator
Feed Advanced Search

Today

t-tye updated the diff for D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.

Fix clang format warnings.

Mon, Oct 19, 9:13 PM · Restricted Project
t-tye updated the diff for D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.

Really upload the review feedback changes.

Mon, Oct 19, 6:53 PM · Restricted Project
t-tye added inline comments to D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.
Mon, Oct 19, 6:48 PM · Restricted Project
t-tye updated the diff for D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.

Address review comments.

Mon, Oct 19, 6:35 PM · Restricted Project
t-tye committed rG6be9c7d2dc15: [AMDGPU] Correct comment typo in SIMemoryLegaliizer.cpp (authored by t-tye).
[AMDGPU] Correct comment typo in SIMemoryLegaliizer.cpp
Mon, Oct 19, 11:51 AM
t-tye committed rG151e297034c7: [AMDGPU] Simplify cumode handling in SIMemoryLegalizer (authored by t-tye).
[AMDGPU] Simplify cumode handling in SIMemoryLegalizer
Mon, Oct 19, 10:14 AM
t-tye closed D89663: [AMDGPU] Simplify cumode handling in SIMemoryLegalizer.
Mon, Oct 19, 10:14 AM · Restricted Project
t-tye committed rG89d71970cb82: [AMDGPU] Extend hip-toolchin-features.hip test (authored by t-tye).
[AMDGPU] Extend hip-toolchin-features.hip test
Mon, Oct 19, 10:12 AM
t-tye closed D89636: [AMDGPU] Extend hip-toolchin-features.hip test.
Mon, Oct 19, 10:12 AM · Restricted Project
t-tye committed rGceb9940b39ca: [AMDGPU] Correct hsa-diag-v3.s test (authored by t-tye).
[AMDGPU] Correct hsa-diag-v3.s test
Mon, Oct 19, 10:09 AM
t-tye closed D89635: [AMDGPU] Correct hsa-diag-v3.s test.
Mon, Oct 19, 10:09 AM · Restricted Project

Yesterday

t-tye updated the diff for D89635: [AMDGPU] Correct hsa-diag-v3.s test.

Generalize label names.

Sun, Oct 18, 10:27 PM · Restricted Project
t-tye updated the diff for D89635: [AMDGPU] Correct hsa-diag-v3.s test.

Correct whitespace.

Sun, Oct 18, 7:33 PM · Restricted Project
t-tye updated the diff for D89635: [AMDGPU] Correct hsa-diag-v3.s test.

Use FileCheck --check-prefixes

Sun, Oct 18, 7:22 PM · Restricted Project
t-tye requested review of D89663: [AMDGPU] Simplify cumode handling in SIMemoryLegalizer.
Sun, Oct 18, 6:42 PM · Restricted Project

Sat, Oct 17

t-tye requested changes to D89459: [AMDGPU] Update ELF machine numbers for newly-added targets.

Before proceeding lets sync as there is one other open question that needs resolving before committing to these changes.

Sat, Oct 17, 4:13 PM · Restricted Project
t-tye requested review of D89636: [AMDGPU] Extend hip-toolchin-features.hip test.
Sat, Oct 17, 4:05 PM · Restricted Project
t-tye added a comment to D89525: [amdgpu] Enhance AMDGPU AA..

LDS and SCRATCH both behave more like TLS. The allocations come into existence when when a thread (or group of threads) get created, and the lifetime ends when those thread(s) terminate. It is UB to reference that memory outside that lifetime. Furthermore, it is UB to dereference the address of LDS and SCRATCH in any thread other than the one that created the address. These rules are defined by the languages although not well explained.

Passing an LDS or SCRATCH address between threads is meaningful provided only the thread(s) that "own" the address dereference it. So storing the address in a global "place" to be read later by an "owning" thread is meaningful. However, some languages may restrict what they allow. So passing as a kernel argument in CUDA appears to not be allowed even though it is meaningful provided the above restricts are met. In OpenCL, there are special rules for passing LDS/Local to a kernel. In OpenCL you actually pass in a byte size, and the kernel dispatch allocates dynamic LDS automatically and passes the address of that to the created thread(s). CUDA has a different syntax for dynamic LDS/Local that is more like TLS.

So how is TLS handled? It seems a TLS address cannot be compile/link time value since it is a runtime concept. So using relocations to initialize global memory program scope variables seems invalid. Initializing a pointer object that is allocated in LDS/SCRATCH to be the address of another LDS/SCRATCH allocated in the same "owning" thread is meaningful and could be implemented using relocations. However, I suspect the languages do not allow this. I am unclear if TLS allows this either.

So you are saying that is always OK to assume no aliasing between a flat pointer which is a kernel argument and a pointer to LDS? OK, thanks!

No I am not quite saying that as some languages are not clears. Having said that, some compiler implementations are assuming that for some languages. Basically the rule is language specific, so AA would need to ask the language if it is permissible to assume that or not. Also bear in mind the OpenCL case for LDS where the kernel argument is not really being passed in from externally, but created independently for each thread/group-of-threads.

Generic pointers are another issue. They are pointers that may point to multiple address spaces. But the rules of dereferncing when they reference the non-global address space are the same. There can be rules that allow a generic pointers to be known to only point to one address space, in which case they can be treated the same as if they were a pointer to that address space. At the hardware level, FLAT instructions can be used to implement language generic pointers. But FLAT instructions can also be used when the address space is fixed, in which case the semantics are the same as the single address space case.

Unlike OpenCL, the CUDA language does not have the address space of pointers as part of the type system. But it still allows allocation of objects to specific address spaces. For CUDA all addressing is conceptually generic, but the allocation address space can be propagated to know the fixed address space of the FLAT operations.

To me deciding point here was that LDS is not actually allocated on host, but instead requested to be allocated at dispatch. If so then host cannot get an actual pointer to it and thus cannot convert it to a generic pointer and pass to a kernel.

Sat, Oct 17, 4:00 PM · Restricted Project
t-tye requested review of D89635: [AMDGPU] Correct hsa-diag-v3.s test.
Sat, Oct 17, 3:10 PM · Restricted Project
t-tye added a comment to D89525: [amdgpu] Enhance AMDGPU AA..

LDS and SCRATCH both behave more like TLS. The allocations come into existence when when a thread (or group of threads) get created, and the lifetime ends when those thread(s) terminate. It is UB to reference that memory outside that lifetime. Furthermore, it is UB to dereference the address of LDS and SCRATCH in any thread other than the one that created the address. These rules are defined by the languages although not well explained.

Passing an LDS or SCRATCH address between threads is meaningful provided only the thread(s) that "own" the address dereference it. So storing the address in a global "place" to be read later by an "owning" thread is meaningful. However, some languages may restrict what they allow. So passing as a kernel argument in CUDA appears to not be allowed even though it is meaningful provided the above restricts are met. In OpenCL, there are special rules for passing LDS/Local to a kernel. In OpenCL you actually pass in a byte size, and the kernel dispatch allocates dynamic LDS automatically and passes the address of that to the created thread(s). CUDA has a different syntax for dynamic LDS/Local that is more like TLS.

So how is TLS handled? It seems a TLS address cannot be compile/link time value since it is a runtime concept. So using relocations to initialize global memory program scope variables seems invalid. Initializing a pointer object that is allocated in LDS/SCRATCH to be the address of another LDS/SCRATCH allocated in the same "owning" thread is meaningful and could be implemented using relocations. However, I suspect the languages do not allow this. I am unclear if TLS allows this either.

So you are saying that is always OK to assume no aliasing between a flat pointer which is a kernel argument and a pointer to LDS? OK, thanks!

Sat, Oct 17, 2:51 PM · Restricted Project
t-tye added a comment to D89525: [amdgpu] Enhance AMDGPU AA..

LDS and SCRATCH both behave more like TLS. The allocations come into existence when when a thread (or group of threads) get created, and the lifetime ends when those thread(s) terminate. It is UB to reference that memory outside that lifetime. Furthermore, it is UB to dereference the address of LDS and SCRATCH in any thread other than the one that created the address. These rules are defined by the languages although not well explained.

Sat, Oct 17, 9:59 AM · Restricted Project
t-tye requested review of D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations.
Sat, Oct 17, 12:25 AM · Restricted Project

Fri, Oct 16

t-tye accepted D89596: [AMDGPU] Update AMDGPUUsage.rst.

LGTM

Fri, Oct 16, 2:33 PM · Restricted Project
t-tye accepted D89565: [AMDGPU] Fix gfx1032 description in AMDGPUUsage.rst. NFC..

LGTM

Fri, Oct 16, 1:26 PM · Restricted Project
t-tye added inline comments to D89487: [AMDGPU] gfx1032 target.
Fri, Oct 16, 10:02 AM · Restricted Project, Restricted Project
t-tye added a comment to D89077: [AMDGPU] Run hazard recognizer pass later.

Will there be a separate review to parametrize the hazard recognizer so that when run early it only resolves the hazards necessary for the register allocator et al. That allows the other hazards to only be resolved in the final run. This would avoid splitting memory clauses early before the waitcnt/invalidate instructions have been inserted that may themselves split the memory clauses.

Fri, Oct 16, 9:50 AM · Restricted Project
t-tye committed rGe2af9bd6118e: [AMDGPU] Correct comment typo in AMDGPUSubtarget.h (authored by t-tye).
[AMDGPU] Correct comment typo in AMDGPUSubtarget.h
Fri, Oct 16, 1:49 AM

Thu, Oct 15

t-tye accepted D89484: [AMDGPU][HIP] Switch default DWARF version to 5.

LGTM

Thu, Oct 15, 4:50 PM · Restricted Project
t-tye accepted D89459: [AMDGPU] Update ELF machine numbers for newly-added targets.

LGTM

Thu, Oct 15, 9:20 AM · Restricted Project

Wed, Oct 14

t-tye committed rGb3a38bc2dcab: [AMDGPU] Correct typos in SIMemoryLegalizer.cpp comments (authored by t-tye).
[AMDGPU] Correct typos in SIMemoryLegalizer.cpp comments
Wed, Oct 14, 7:29 PM

Tue, Oct 13

t-tye committed rG907d799070c3: [AMDGPU] Cleanup memory legalizer interfaces (authored by t-tye).
[AMDGPU] Cleanup memory legalizer interfaces
Tue, Oct 13, 11:28 PM
t-tye closed D89355: [AMDGPU] Cleanup memory legalizer interfaces.
Tue, Oct 13, 11:27 PM · Restricted Project
t-tye requested review of D89355: [AMDGPU] Cleanup memory legalizer interfaces.
Tue, Oct 13, 6:08 PM · Restricted Project
t-tye added inline comments to rG666ef0db208b: [AMDGPU] Add gfx602, gfx705, gfx805 targets.
Tue, Oct 13, 4:23 PM

Mon, Oct 12

t-tye committed rGfe145b66ecfd: [AMDGPU] Correct processor names for gfx1010 and gfx1011 (authored by t-tye).
[AMDGPU] Correct processor names for gfx1010 and gfx1011
Mon, Oct 12, 1:16 PM
t-tye closed D89259: [AMDGPU] Correct processor names for gfx1010 and gfx1011.
Mon, Oct 12, 1:16 PM · Restricted Project
t-tye added a comment to D89170: [AMDGPU] Select flat scratch instructions where available.

I haven't done a meaningful review, but I wanted to note that this will require changes to the debug information (which isn't committed yet). I think this could be as simple as scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address.

Mon, Oct 12, 12:54 PM · Restricted Project
t-tye requested review of D89259: [AMDGPU] Correct processor names for gfx1010 and gfx1011.
Mon, Oct 12, 12:26 PM · Restricted Project

Fri, Oct 9

t-tye accepted D89125: AMDGPU: Remove -mamdgpu-debugger-abi option.

LGTM

Fri, Oct 9, 1:35 PM · Restricted Project

Thu, Oct 8

t-tye added inline comments to D89076: AMDGPU: Update AMDHSA code object version handling.
Thu, Oct 8, 7:53 PM · Restricted Project, Restricted Project
t-tye added a comment to D89077: [AMDGPU] Run hazard recognizer pass later.

Is this now running after the waitcnt insertion pass? That would avoid the NOPs currently being inserted to split memory clauses that are not necessary as the waitcnt instructions will split the clauses.

We also insert nops in the post-RA scheduler.

Thu, Oct 8, 7:31 PM · Restricted Project
t-tye accepted D89091: Regenerate ClangCommandLineReference.rst.

LGTM

Thu, Oct 8, 7:27 PM · Restricted Project
t-tye accepted D89076: AMDGPU: Update AMDHSA code object version handling.

LGTM Requested changes will happen in a separate patch.

Thu, Oct 8, 7:25 PM · Restricted Project, Restricted Project
t-tye added a comment to D89077: [AMDGPU] Run hazard recognizer pass later.

Is this now running after the waitcnt insertion pass? That would avoid the NOPs currently being inserted to split memory clauses that are not necessary as the waitcnt instructions will split the clauses.

Thu, Oct 8, 5:33 PM · Restricted Project
t-tye requested changes to D89076: AMDGPU: Update AMDHSA code object version handling.

LGTM except the two suggestions.

Thu, Oct 8, 5:30 PM · Restricted Project, Restricted Project
t-tye accepted D89076: AMDGPU: Update AMDHSA code object version handling.
Thu, Oct 8, 5:29 PM · Restricted Project, Restricted Project

Wed, Oct 7

t-tye added inline comments to D88916: [AMDGPU] Add gfx602, gfx705, gfx805 targets.
Wed, Oct 7, 2:01 AM · Restricted Project, Restricted Project

Tue, Oct 6

t-tye added inline comments to D88916: [AMDGPU] Add gfx602, gfx705, gfx805 targets.
Tue, Oct 6, 7:15 PM · Restricted Project, Restricted Project

Sep 17 2020

t-tye added inline comments to D87858: [hip] Add HIP scope atomic ops..
Sep 17 2020, 4:33 PM · Restricted Project
t-tye requested changes to D87858: [hip] Add HIP scope atomic ops..
Sep 17 2020, 2:55 PM · Restricted Project
t-tye added inline comments to D87858: [hip] Add HIP scope atomic ops..
Sep 17 2020, 2:55 PM · Restricted Project

Sep 9 2020

t-tye committed rG72e2fbde5456: [AMDGPU] Correct gfx1031 XNACK setting documentation (authored by t-tye).
[AMDGPU] Correct gfx1031 XNACK setting documentation
Sep 9 2020, 12:45 PM
t-tye closed D87198: [AMDGPU] Correct gfx1031 XNACK setting documentation.
Sep 9 2020, 12:45 PM · Restricted Project
t-tye accepted D87356: [docs] Fix typos.

LGTM

Sep 9 2020, 3:17 AM · Restricted Project

Sep 5 2020

t-tye requested review of D87198: [AMDGPU] Correct gfx1031 XNACK setting documentation.
Sep 5 2020, 4:00 PM · Restricted Project

Sep 3 2020

t-tye added inline comments to D84822: Add documentation for target ID and ClangOffloadBundlerFormat.
Sep 3 2020, 8:57 AM · Restricted Project

Aug 31 2020

t-tye added a comment to D86902: [AMDGPU] Correct documentation for default setting of sram-ecc.

So it seems the e_flags report that sram-ecc off, yet the code generated is for on.

Aug 31 2020, 8:27 PM · Restricted Project
t-tye retitled D86902: [AMDGPU] Correct documentation for default setting of sram-ecc from [AMDGPU] Correct documentation for default setting of sram-ecc to on to [AMDGPU] Correct documentation for default setting of sram-ecc.
Aug 31 2020, 8:25 PM · Restricted Project
t-tye retitled D86902: [AMDGPU] Correct documentation for default setting of sram-ecc from [AMDGPU] Correct documetnation for default setting of sram-ecc to on to [AMDGPU] Correct documentation for default setting of sram-ecc to on.
Aug 31 2020, 7:10 PM · Restricted Project
t-tye updated the diff for D86902: [AMDGPU] Correct documentation for default setting of sram-ecc.

Correct commit heading.

Aug 31 2020, 7:09 PM · Restricted Project
t-tye requested review of D86902: [AMDGPU] Correct documentation for default setting of sram-ecc.
Aug 31 2020, 7:07 PM · Restricted Project

Aug 24 2020

t-tye accepted D86340: [AMDGPU, docs] Fix typos.

LGTM

Aug 24 2020, 8:24 AM · Restricted Project

Aug 21 2020

t-tye requested changes to D86340: [AMDGPU, docs] Fix typos.
Aug 21 2020, 10:14 AM · Restricted Project

Aug 20 2020

t-tye requested changes to D84822: Add documentation for target ID and ClangOffloadBundlerFormat.
Aug 20 2020, 11:04 AM · Restricted Project
t-tye resigned from D84522: [AMDGPU] Reorganize GCN subtarget features for unaligned access.

Thanks for updating the subtarget features. I defer to @arsenm for the rest of the review.

Aug 20 2020, 8:58 AM · Restricted Project

Aug 19 2020

t-tye added inline comments to D84822: Add documentation for target ID and ClangOffloadBundlerFormat.
Aug 19 2020, 7:26 PM · Restricted Project
t-tye committed rGb690c1157e90: [AMDGPU] Correct DWARF register defintions (authored by t-tye).
[AMDGPU] Correct DWARF register defintions
Aug 19 2020, 6:19 PM
t-tye closed D86259: [AMDGPU] Correct DWARF register defintions.
Aug 19 2020, 6:19 PM · Restricted Project
t-tye requested review of D86259: [AMDGPU] Correct DWARF register defintions.
Aug 19 2020, 6:15 PM · Restricted Project
t-tye accepted D86206: [NFC] Fix typo in AMDGPU doc.

LGTM

Aug 19 2020, 6:53 AM · Restricted Project

Aug 18 2020

t-tye requested changes to D84522: [AMDGPU] Reorganize GCN subtarget features for unaligned access.

We are about to change the xnack and sramecc subtarget features to remove the "DoesNot" so I would suggest that FeatureDoesNotSupportUnalignedBufferAccess changes to FeatureSupportUnalignedBufferAccess.

Aug 18 2020, 3:38 PM · Restricted Project
t-tye added inline comments to D84822: Add documentation for target ID and ClangOffloadBundlerFormat.
Aug 18 2020, 2:39 PM · Restricted Project

Aug 17 2020

t-tye added inline comments to D84822: Add documentation for target ID and ClangOffloadBundlerFormat.
Aug 17 2020, 8:29 PM · Restricted Project
t-tye requested changes to D84822: Add documentation for target ID and ClangOffloadBundlerFormat.
Aug 17 2020, 7:51 PM · Restricted Project
t-tye added inline comments to D84822: Add documentation for target ID and ClangOffloadBundlerFormat.
Aug 17 2020, 7:51 PM · Restricted Project

Aug 13 2020

t-tye added inline comments to D84822: Add documentation for target ID and ClangOffloadBundlerFormat.
Aug 13 2020, 6:34 PM · Restricted Project
t-tye added a reviewer for D84822: Add documentation for target ID and ClangOffloadBundlerFormat: arsenm.
Aug 13 2020, 11:02 AM · Restricted Project
t-tye added inline comments to D85882: [AMDGPU] Update subtarget features for new target ID support.
Aug 13 2020, 8:14 AM · Restricted Project

Aug 12 2020

t-tye accepted D84822: Add documentation for target ID and ClangOffloadBundlerFormat.

LGTM

Aug 12 2020, 10:01 PM · Restricted Project
t-tye requested changes to D84822: Add documentation for target ID and ClangOffloadBundlerFormat.
Aug 12 2020, 9:57 AM · Restricted Project
t-tye added inline comments to D84822: Add documentation for target ID and ClangOffloadBundlerFormat.
Aug 12 2020, 9:57 AM · Restricted Project

Aug 9 2020

t-tye added a comment to D85603: IR: Add convergence control operand bundle and intrinsics.

Sorry, just lots of questions:-)

Aug 9 2020, 1:48 PM · Restricted Project

Aug 6 2020

t-tye committed rGce74e97d9b15: [AMDGPU] Correct missing sram-ecc target feature for gfx906 (authored by t-tye).
[AMDGPU] Correct missing sram-ecc target feature for gfx906
Aug 6 2020, 3:13 PM
t-tye closed D85476: [AMDGPU] Correct missing sram-ecc target feature for gfx906.
Aug 6 2020, 3:13 PM · Restricted Project
t-tye added a reviewer for D85476: [AMDGPU] Correct missing sram-ecc target feature for gfx906: yaxunl.
Aug 6 2020, 3:07 PM · Restricted Project
t-tye added a reviewer for D85476: [AMDGPU] Correct missing sram-ecc target feature for gfx906: kzhuravl.
Aug 6 2020, 3:07 PM · Restricted Project
t-tye requested review of D85476: [AMDGPU] Correct missing sram-ecc target feature for gfx906.
Aug 6 2020, 3:06 PM · Restricted Project

Aug 5 2020

t-tye accepted D84822: Add documentation for target ID and ClangOffloadBundlerFormat.

LGTM except for minor :doc: reference comment.

Aug 5 2020, 7:12 PM · Restricted Project
t-tye added inline comments to D84822: Add documentation for target ID and ClangOffloadBundlerFormat.
Aug 5 2020, 12:51 PM · Restricted Project

Jul 29 2020

t-tye committed rG629467eb981a: [AMDGPU] Fix DWARF extensions User Guide table of contents (authored by t-tye).
[AMDGPU] Fix DWARF extensions User Guide table of contents
Jul 29 2020, 10:12 PM
t-tye committed rGe24f5f314914: [AMDGPU] DWARF proposal changes (authored by t-tye).
[AMDGPU] DWARF proposal changes
Jul 29 2020, 10:08 PM
t-tye closed D70523: [AMDGPU] Update AMDGPUUsage with DWARF proposal.
Jul 29 2020, 10:08 PM · debug-info, Restricted Project
t-tye updated the diff for D70523: [AMDGPU] Update AMDGPUUsage with DWARF proposal.

Clarify that these are extensions to DWARF 5 and not as yet a
proposal.

Jul 29 2020, 10:04 PM · debug-info, Restricted Project
t-tye reopened D70523: [AMDGPU] Update AMDGPUUsage with DWARF proposal.

Clarify these are extension to DWARF 5 and not as yet a proposal to DWARF.

Jul 29 2020, 9:34 PM · debug-info, Restricted Project
t-tye committed rG5aa2fd88cfb7: [AMDGPU] DWARF proposal changes for expression context (authored by t-tye).
[AMDGPU] DWARF proposal changes for expression context
Jul 29 2020, 7:00 PM
t-tye closed D70523: [AMDGPU] Update AMDGPUUsage with DWARF proposal.
Jul 29 2020, 6:59 PM · debug-info, Restricted Project
t-tye updated the diff for D70523: [AMDGPU] Update AMDGPUUsage with DWARF proposal.

[AMDGPU] DWARF proposal changes for expression context

Jul 29 2020, 6:02 PM · debug-info, Restricted Project
t-tye reopened D70523: [AMDGPU] Update AMDGPUUsage with DWARF proposal.

Need to add clarification of how context is used in the evaluation of DWARF expressions and how evaluation relates to CFI information.

Jul 29 2020, 3:46 PM · debug-info, Restricted Project

Jul 22 2020

t-tye added inline comments to D84194: [AMDGPU] Correct the number of SGPR blocks used for GFX9.
Jul 22 2020, 12:59 AM · Restricted Project

Jul 1 2020

t-tye committed rG31fdcf64d24d: [AMDGPU] Update DWARF proposal (authored by t-tye).
[AMDGPU] Update DWARF proposal
Jul 1 2020, 2:05 PM