Page MenuHomePhabricator

t-tye (Tony Tye)
User

Projects

User does not belong to any projects.

User Details

User Since
Mar 22 2017, 6:01 PM (221 w, 23 h)

Recent Activity

Mon, Jun 7

t-tye added a comment to D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers..

You probably need to wrap all prologue LDS stores into a block to execute it only from lane 0 and add a barrier after. @t-tye correct me if I am wrong.

But, I remember that we had decided to avoid barrier here, and instead just make sure that each thread within each wave execute the store instructions? In anycase, let me clarify it with @t-tye and @b-sumner.

I do not remember, but probably we can omit it since it is a singe store readonly memory. Anyway a confirmation from @t-tye would be nice.

For the record, the agreed way is to do a store from lane 0 of each wave and follow with a wave barrier.

Mon, Jun 7, 8:27 PM · Restricted Project

Fri, May 21

t-tye committed rG355114a7532d: [NFC][AMDGPU] Add documentation for AMD Instinct MI100 accelerator (authored by t-tye).
[NFC][AMDGPU] Add documentation for AMD Instinct MI100 accelerator
Fri, May 21, 9:52 AM
t-tye closed D102859: [NFC][AMDGPU] Add documentation for AMD Instinct MI100 accelerator.
Fri, May 21, 9:52 AM · Restricted Project
t-tye committed rGb408efe4ffcd: [NFC][AMDGPU] Mark C code in AMDGPUUsage.rst (authored by t-tye).
[NFC][AMDGPU] Mark C code in AMDGPUUsage.rst
Fri, May 21, 3:10 AM
t-tye closed D102910: [NFC][AMDGPU] Mark C code in AMDGPUUsage.rst.
Fri, May 21, 3:09 AM · Restricted Project
t-tye requested review of D102910: [NFC][AMDGPU] Mark C code in AMDGPUUsage.rst.
Fri, May 21, 2:56 AM · Restricted Project

Thu, May 20

t-tye added a reviewer for D102859: [NFC][AMDGPU] Add documentation for AMD Instinct MI100 accelerator: dp.
Thu, May 20, 1:26 PM · Restricted Project
t-tye updated the diff for D102859: [NFC][AMDGPU] Add documentation for AMD Instinct MI100 accelerator.

Add feedback from @dp .

Thu, May 20, 1:26 PM · Restricted Project
t-tye requested changes to D102837: [AMDGPU][DOC][NFC] Added links to public description of MI100 ISA.

Can you review D102859 instead? I will add your change to that as well.

Thu, May 20, 1:15 PM · Restricted Project
t-tye requested review of D102859: [NFC][AMDGPU] Add documentation for AMD Instinct MI100 accelerator.
Thu, May 20, 11:48 AM · Restricted Project

Wed, May 19

t-tye added a comment to D102691: [AMDGPU][Libomptarget] Remove global KernelNameMap.

I like the direction. Could we hold it for a day or so? I'd like to check through the uses of the kernel name to see if there's a missing edge case, or if we can simplify this a step further.

It looks like the msgpack data always contains the foo and the foo.kd strings, under different keys. I wonder if that's something we can rely on the compiler emitting.

Wed, May 19, 1:12 AM · Restricted Project

May 18 2021

t-tye accepted D102708: AMDGPU/NFC: Replace EF_AMDGPU_MACH_AMDGCN_RESERVED_0X3E with EF_AMDGPU_MACH_AMDGCN_GFX1034.

This needs AMDGPUUsage to also be updated that these values are now reserved.

May 18 2021, 4:06 PM · Restricted Project
t-tye reopened D102708: AMDGPU/NFC: Replace EF_AMDGPU_MACH_AMDGCN_RESERVED_0X3E with EF_AMDGPU_MACH_AMDGCN_GFX1034.

This needs AMDGPUUsage to also be updated that these values are now reserved.

May 18 2021, 1:44 PM · Restricted Project

May 17 2021

t-tye accepted D102366: [AMDGPU] Do not check denorm for LDS FP atomic with unsafe flag.

LGTM

May 17 2021, 3:58 PM · Restricted Project

May 13 2021

t-tye accepted D102432: [AMDGPU] Add support for architected flat scratch.

The documentation parts LGTM.

May 13 2021, 2:38 PM · Restricted Project
t-tye added inline comments to D102432: [AMDGPU] Add support for architected flat scratch.
May 13 2021, 2:07 PM · Restricted Project
t-tye added a comment to D94648: [amdgpu] Implement lower function LDS pass.

I suspect there is something in hardware that rounds LDS allocation up to a boundary, so as long as the kernel looks like it uses some non-zero amount of LDS, the out of bounds read hits in the allocated region.

May 13 2021, 8:23 AM · Restricted Project

May 12 2021

t-tye accepted D102347: [AMDGPU] Only allow global fp atomics with unsafe option.

Added @b-sumner to review.

May 12 2021, 7:54 PM · Restricted Project
t-tye added a reviewer for D102347: [AMDGPU] Only allow global fp atomics with unsafe option: b-sumner.
May 12 2021, 7:41 PM · Restricted Project
t-tye added inline comments to D102347: [AMDGPU] Only allow global fp atomics with unsafe option.
May 12 2021, 12:06 PM · Restricted Project

May 11 2021

t-tye added inline comments to D102252: [AMDGPU] Fix extra waitcnt being added with BUFFER_INVL2.
May 11 2021, 12:51 PM · Restricted Project
t-tye added a comment to D102177: [AMDGPU][RFC] Improve sgpr function arguments.

We have another proposal we were working on to rearrange these a bit differently. We need to account for a few more inputs in the layout

As long as this remains in GFX land, we should be fine with it because our new proposal is for compute only (as of now).

I would like to keep the same calling convention in compute and graphics. At least regarding the stack pointer and others, because I don’t see a compelling reason to diverge even more. Actually, I’d like it if they were more common than they are now, because we implement some things twice at the moment.
The compute proposal should work just fine; if we move the stack and frame pointer, we end up with the same benefits as in this patch. I commented on the internal proposal for this (I hope I found the right one?).

Well, then you're saying unification of both ABIs and it is not discussed thoroughly internally. The layout needs to be documented and get reviewed internally before we can proceed with this patch.

May 11 2021, 8:38 AM · Restricted Project
t-tye committed rGd6a228cba47f: [NFC][AMDGPU] Correct product name for gfx908 (authored by t-tye).
[NFC][AMDGPU] Correct product name for gfx908
May 11 2021, 8:23 AM
t-tye closed D102209: [NFC][AMDGPU] Correct product name for gfx908.
May 11 2021, 8:23 AM · Restricted Project

May 10 2021

t-tye requested review of D102209: [NFC][AMDGPU] Correct product name for gfx908.
May 10 2021, 7:30 PM · Restricted Project

Apr 29 2021

t-tye accepted D101304: AMDGPU/llvm-readobj: Add missing tests for note parsing/displaying.

LGTM

Apr 29 2021, 2:32 PM · Restricted Project

Apr 20 2021

t-tye added a comment to D100404: Add Global support for #pragma clang attributes.

Sure, it will work for me, though FWIW I think it's the worse option of the 3 potential solutions, and I'll be puzzled if we end with that UI.
Can anyone else speak up and state your opinions, please?

Apr 20 2021, 9:04 AM

Apr 16 2021

t-tye committed rG13875aab4e7d: [AMDGPU] Enforce that gfx802/803/805 do not support XNACK (authored by t-tye).
[AMDGPU] Enforce that gfx802/803/805 do not support XNACK
Apr 16 2021, 12:35 PM
t-tye closed D100679: [AMDGPU] Enforce that gfx802/803/805 do not support XNACK.
Apr 16 2021, 12:35 PM · Restricted Project
t-tye requested review of D100679: [AMDGPU] Enforce that gfx802/803/805 do not support XNACK.
Apr 16 2021, 12:27 PM · Restricted Project

Apr 15 2021

t-tye added inline comments to D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed.
Apr 15 2021, 8:54 AM · Restricted Project

Apr 14 2021

t-tye added inline comments to D100481: [AMDGPU] Disable forceful inline of non-kernel functions which use LDS..
Apr 14 2021, 1:12 PM · Restricted Project

Apr 12 2021

t-tye added inline comments to D100281: [AMDGPU] Revise handling of preexisting waitcnt.
Apr 12 2021, 12:17 AM · Restricted Project

Apr 8 2021

t-tye accepted D100126: AMDGPU: Add gfx90c support to code object v2 for backwards compatibility.

LGTM

Apr 8 2021, 12:31 PM · Restricted Project
t-tye requested changes to D100126: AMDGPU: Add gfx90c support to code object v2 for backwards compatibility.
Apr 8 2021, 11:35 AM · Restricted Project

Apr 7 2021

t-tye committed rG2e9465ce2ef6: [NFC][AMDGPU] Correct indentation in AMDGPUUsage.rst (authored by t-tye).
[NFC][AMDGPU] Correct indentation in AMDGPUUsage.rst
Apr 7 2021, 6:01 PM
t-tye added a comment to D100072: [AMDGPU] Allow -amdgpu-unsafe-fp-atomics to ignore denorm mode.

Documentation LGTM

Apr 7 2021, 5:31 PM · Restricted Project, Restricted Project
t-tye added inline comments to D100072: [AMDGPU] Allow -amdgpu-unsafe-fp-atomics to ignore denorm mode.
Apr 7 2021, 4:57 PM · Restricted Project, Restricted Project
t-tye accepted D100069: Disable use of SCC bit from asm.

LGTM

Apr 7 2021, 3:19 PM · Restricted Project
t-tye committed rG4658cd4c18ba: [AMDGPU] Update gfx90a memory model support (authored by t-tye).
[AMDGPU] Update gfx90a memory model support
Apr 7 2021, 3:18 PM
t-tye closed D100070: [AMDGPU] Update gfx90a memory model support.
Apr 7 2021, 3:18 PM · Restricted Project
t-tye added inline comments to D100069: Disable use of SCC bit from asm.
Apr 7 2021, 3:14 PM · Restricted Project
t-tye requested review of D100070: [AMDGPU] Update gfx90a memory model support.
Apr 7 2021, 3:02 PM · Restricted Project

Apr 4 2021

t-tye added a comment to D96336: [AMDGPU] Save VGPR of whole wave when spilling.

As an aside, if we are moving to using flat scratch in the main, is it possible to replace most of this with s_scratch_store / s_scratch_load and avoid the need for an VGPR entirely?

That would make sense, but it feels like s_scratch instructions got removed in newer hardware.

Apr 4 2021, 10:01 AM · Restricted Project

Apr 3 2021

t-tye added a reviewer for D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask.: mjbedy.
Apr 3 2021, 9:29 AM · Restricted Project

Apr 1 2021

t-tye committed rG4c70f56ec67b: [NFC][AMDGPU] Add product names for gfx908 and gfx10 processors (authored by t-tye).
[NFC][AMDGPU] Add product names for gfx908 and gfx10 processors
Apr 1 2021, 5:59 PM
t-tye closed D99781: [NFC][AMDGPU] Add product names for gfx908 and gfx10 processors.
Apr 1 2021, 5:58 PM · Restricted Project
t-tye requested review of D99781: [NFC][AMDGPU] Add product names for gfx908 and gfx10 processors.
Apr 1 2021, 5:39 PM · Restricted Project

Mar 29 2021

t-tye accepted D93125: Update AMDGPU PAL usage documentation.

LGTM

Mar 29 2021, 7:07 PM · Restricted Project
t-tye added inline comments to D93125: Update AMDGPU PAL usage documentation.
Mar 29 2021, 3:27 PM · Restricted Project
t-tye added inline comments to D93125: Update AMDGPU PAL usage documentation.
Mar 29 2021, 3:09 PM · Restricted Project

Mar 25 2021

t-tye committed rG850fcedb272f: [NFC][AMDGPU] Corrections to AMD GPU initial kernel launch documentation (authored by t-tye).
[NFC][AMDGPU] Corrections to AMD GPU initial kernel launch documentation
Mar 25 2021, 7:06 PM
t-tye closed D99223: [NFC][AMDGPU] Corrections to AMD GPU initial kernel launch documentation.
Mar 25 2021, 7:06 PM · Restricted Project

Mar 24 2021

t-tye added inline comments to D99128: [AMDGPU] Removed unnecessary cache invalidations..
Mar 24 2021, 10:18 PM · Restricted Project

Mar 23 2021

t-tye retitled D99223: [NFC][AMDGPU] Corrections to AMD GPU initial kernel launch documentation from [NFC][AMDGPU] Corrections to AMD GPU initial kernel launch documention to [NFC][AMDGPU] Corrections to AMD GPU initial kernel launch documentation.
Mar 23 2021, 3:44 PM · Restricted Project
t-tye requested review of D99223: [NFC][AMDGPU] Corrections to AMD GPU initial kernel launch documentation.
Mar 23 2021, 3:41 PM · Restricted Project
t-tye requested changes to D99128: [AMDGPU] Removed unnecessary cache invalidations..
Mar 23 2021, 1:57 PM · Restricted Project
t-tye committed rGc181724a9b9a: [NFC][AMDGPU] Reserve AMD GPU ELF machine number 0x41 (authored by t-tye).
[NFC][AMDGPU] Reserve AMD GPU ELF machine number 0x41
Mar 23 2021, 10:53 AM
t-tye closed D99196: [NFC][AMDGPU] Reserve AMD GPU ELF machine number 0x41.
Mar 23 2021, 10:53 AM · Restricted Project
t-tye requested review of D99196: [NFC][AMDGPU] Reserve AMD GPU ELF machine number 0x41.
Mar 23 2021, 9:26 AM · Restricted Project

Mar 22 2021

t-tye committed rG1e04706adbb1: [AMDGPU] Reserve ELF code (authored by t-tye).
[AMDGPU] Reserve ELF code
Mar 22 2021, 9:31 PM
t-tye closed D99122: [AMDGPU] Reserve ELF code.
Mar 22 2021, 9:31 PM · Restricted Project
t-tye updated the diff for D99122: [AMDGPU] Reserve ELF code.

Address review comments.

Mar 22 2021, 8:47 PM · Restricted Project
t-tye updated the diff for D99122: [AMDGPU] Reserve ELF code.

Address review comments.

Mar 22 2021, 7:00 PM · Restricted Project
t-tye requested review of D99122: [AMDGPU] Reserve ELF code.
Mar 22 2021, 3:30 PM · Restricted Project
t-tye added a comment to D99061: [AMDGPU] Only unbundle memory accesses in SIMemoryLegalizer.

What was the decision on the philosophy on who should be doing unbundling in general? How is it known that it is sfe for the memory legalized to do this unbundling? Is the policy documented in some AMD GPU design page?

Mar 22 2021, 10:44 AM · Restricted Project

Mar 19 2021

t-tye added a comment to D98940: [AMDGPU] Allow index optimisation in SIPreEmitPeephole for bundles.

See D72737 and D91048 for the history of that. I'm not saying it's the right place to do it. The point of the current patch is to help us move away from that, i.e. to do less unbundling in SIMemoryLegalizer.

I do not think SIMemoryLegalizer should be doing any unbundling as it is conflating tasks. So will this direction manage to eliminate it?

No, not entirely. D72737 added some unbundling (only of bundles containing loads or stores) and D91048 added some more (all bundles). The intention here is just to undo the D91048 part.

Why would SIMemoryLegalizer know it is correct to do the unbundling? Presumably the instructions are in a bundle for a reason? How does SIMemoryLegalizer know that that reason is no longer necessary? Is SIMemoryLegalizer meant to know the passes hat run after it and what their needs are? Is there a point in the compilation where the need for these bundles to exist disappears? If so shouldn't that be the point the unbundling is done? It feels uncomfortable to put this kind of things in passes as it makes things very brittle. If someone made changes to the bundling would they be aware that SIMemoryLegalizer is doing this?

Don't ask me! I'm just trying to explain the status quo. I am also uncomfortable with the idea of bundles being unbundled or modified in general, though I can see that it would be useful to be able to bundle some stuff in one pass and then later unbundle "my" bundles only -- but at the moment there is no robust way of distinguishing "my" bundles from anyone else's.

Anyway I think you need to take this up with @rampitec who wrote D72737.

Mar 19 2021, 7:56 AM · Restricted Project
t-tye added a comment to D98940: [AMDGPU] Allow index optimisation in SIPreEmitPeephole for bundles.

See D72737 and D91048 for the history of that. I'm not saying it's the right place to do it. The point of the current patch is to help us move away from that, i.e. to do less unbundling in SIMemoryLegalizer.

I do not think SIMemoryLegalizer should be doing any unbundling as it is conflating tasks. So will this direction manage to eliminate it?

No, not entirely. D72737 added some unbundling (only of bundles containing loads or stores) and D91048 added some more (all bundles). The intention here is just to undo the D91048 part.

Mar 19 2021, 7:38 AM · Restricted Project
t-tye added a comment to D98940: [AMDGPU] Allow index optimisation in SIPreEmitPeephole for bundles.

Currently these are unbundled in SIMemoryLegalizer.

Why is the SIMemoryLegalizer the right place to do the unbundling? Why would the memory SIMemoryLegalizer know that it has "permission" to do that? The SIMemoryLegalizer was intended to do the single job of expanding the atomics semantics.

See D72737 and D91048 for the history of that. I'm not saying it's the right place to do it. The point of the current patch is to help us move away from that, i.e. to do less unbundling in SIMemoryLegalizer.

Mar 19 2021, 7:30 AM · Restricted Project
t-tye added a comment to D98940: [AMDGPU] Allow index optimisation in SIPreEmitPeephole for bundles.

Currently these are unbundled in SIMemoryLegalizer.

Mar 19 2021, 7:07 AM · Restricted Project

Mar 16 2021

t-tye added a comment to D98746: [clang][amdgpu] Use implicit code object default.

I vaguely remember that clang needed to know what code object it was going to request as it used that to either validate other options, or change the format of other passed cc1 options. If that is true, then I am not sure the defaulting approach works as clang will not know what the backend will be defaulting to. @yaxunl can you remember this?

Mar 16 2021, 7:42 PM · Restricted Project
t-tye added a comment to D98746: [clang][amdgpu] Use implicit code object default.

I have no opinion, just making an observation and defer to @kzhuravl .

Mar 16 2021, 5:14 PM · Restricted Project
t-tye added inline comments to D98746: [clang][amdgpu] Use implicit code object default.
Mar 16 2021, 4:55 PM · Restricted Project

Mar 8 2021

t-tye added a comment to D98201: [CUDA][HIP] Add #pragma clang force_cuda_device_globals {begin,end}.

An example usage is to run a large part of the gdb test suite on the GPU. The tests normally run on the CPU, but can also be made to run on the GPU within a test harness that emulates the necessary environment. For that to work the variables need to be forced to be device. The issues of concurrency do not happen due to the nature of the environment. Porting or modifying the entire test suite is not particularly viable.

Mar 8 2021, 6:39 PM
t-tye committed rG2817e21c4172: [NFC][AMDGPU] Correct typo in DWARF Extensions For Heterogeneous Debugging (authored by t-tye).
[NFC][AMDGPU] Correct typo in DWARF Extensions For Heterogeneous Debugging
Mar 8 2021, 4:25 PM
t-tye closed D98157: [NFC][AMDGPU] Correct typo in DWARF Extensions For Heterogeneous Debugging.
Mar 8 2021, 4:24 PM · Restricted Project
t-tye added inline comments to D98083: [NFC][AMDGPU] Improve documentation of AMDGPU handling of volatile.
Mar 8 2021, 5:45 AM · Restricted Project

Mar 7 2021

t-tye requested review of D98157: [NFC][AMDGPU] Correct typo in DWARF Extensions For Heterogeneous Debugging.
Mar 7 2021, 2:45 PM · Restricted Project
t-tye committed rGf79bab3fd7f4: [NFC][AMDGPU] DWARF Extensions For Heterogeneous Debugging clarifications (authored by t-tye).
[NFC][AMDGPU] DWARF Extensions For Heterogeneous Debugging clarifications
Mar 7 2021, 10:35 AM
t-tye closed D98137: [NFC][AMDGPU] DWARF Extensions For Heterogeneous Debugging clarifications.
Mar 7 2021, 10:35 AM · Restricted Project
t-tye requested review of D98137: [NFC][AMDGPU] DWARF Extensions For Heterogeneous Debugging clarifications.
Mar 7 2021, 12:52 AM · Restricted Project

Mar 6 2021

t-tye committed rGca602a72b37d: [NFC][AMDGPU]DWARF Extensions For Heterogeneous Debugging generic type endianity (authored by t-tye).
[NFC][AMDGPU]DWARF Extensions For Heterogeneous Debugging generic type endianity
Mar 6 2021, 8:52 PM
t-tye closed D98126: [NFC][AMDGPU]DWARF Extensions For Heterogeneous Debugging generic type endianity.
Mar 6 2021, 8:51 PM · Restricted Project
t-tye added a reviewer for D98126: [NFC][AMDGPU]DWARF Extensions For Heterogeneous Debugging generic type endianity: zoran.zaric.
Mar 6 2021, 1:56 PM · Restricted Project
t-tye requested review of D98126: [NFC][AMDGPU]DWARF Extensions For Heterogeneous Debugging generic type endianity.
Mar 6 2021, 12:16 PM · Restricted Project
t-tye added inline comments to D98085: [AMDGPU] Always expand system scope fp atomics on gfx90a.
Mar 6 2021, 10:32 AM · Restricted Project

Mar 5 2021

t-tye resigned from D98085: [AMDGPU] Always expand system scope fp atomics on gfx90a.

Discussed offline.

Mar 5 2021, 6:44 PM · Restricted Project
t-tye added inline comments to D98085: [AMDGPU] Always expand system scope fp atomics on gfx90a.
Mar 5 2021, 6:19 PM · Restricted Project
t-tye added inline comments to D98085: [AMDGPU] Always expand system scope fp atomics on gfx90a.
Mar 5 2021, 6:04 PM · Restricted Project
t-tye added inline comments to D98085: [AMDGPU] Always expand system scope fp atomics on gfx90a.
Mar 5 2021, 5:47 PM · Restricted Project
t-tye requested changes to D98085: [AMDGPU] Always expand system scope fp atomics on gfx90a.
Mar 5 2021, 5:44 PM · Restricted Project
t-tye requested review of D98083: [NFC][AMDGPU] Improve documentation of AMDGPU handling of volatile.
Mar 5 2021, 3:25 PM · Restricted Project

Mar 2 2021

t-tye added a comment to D97670: [AMDGPU] New intrinsic void llvm.amdgcn.s.sethalt(i32).

What other debuggers do you have in mind? There is the rocgdb and the UMR hardware debugger. Do you know of others for AMDGPU? If non-HSA targets would like to support a debugger it would make sense that the also add support for llvm.debugtrap as well:-)

Mar 2 2021, 5:48 PM · Restricted Project
t-tye added inline comments to D97598: [NFC][AMDGPU] Document the AMDGPU target feature defaults.
Mar 2 2021, 9:18 AM · Restricted Project

Mar 1 2021

t-tye added a comment to D97670: [AMDGPU] New intrinsic void llvm.amdgcn.s.sethalt(i32).

The expected use case is for frontends to insert this into
shaders that are to be run under a debugger. The shader can
then be resumed or single stepped from the point of the call
under debugger control.

Mar 1 2021, 6:11 PM · Restricted Project
t-tye added inline comments to D97598: [NFC][AMDGPU] Document the AMDGPU target feature defaults.
Mar 1 2021, 5:24 PM · Restricted Project
t-tye added inline comments to D97598: [NFC][AMDGPU] Document the AMDGPU target feature defaults.
Mar 1 2021, 12:54 PM · Restricted Project

Feb 27 2021

t-tye committed rG2da13f1246e1: [NFC][AMDGPU] Document the AMDGPU target feature defaults (authored by t-tye).
[NFC][AMDGPU] Document the AMDGPU target feature defaults
Feb 27 2021, 10:29 AM
t-tye closed D97598: [NFC][AMDGPU] Document the AMDGPU target feature defaults.
Feb 27 2021, 10:29 AM · Restricted Project

Feb 26 2021

t-tye requested review of D97598: [NFC][AMDGPU] Document the AMDGPU target feature defaults.
Feb 26 2021, 4:42 PM · Restricted Project