User Details
- User Since
- Dec 4 2018, 6:02 AM (251 w, 3 h)
Thu, Sep 21
Tested on 10k pipelines from Vulkan games, this patch reduces the number of v_writelane instructions from 9842 to 6440 (at the expense of using more loads of course).
Mon, Sep 18
Then, as you say, our register allocation needs to be intelligent enough to keep the matrices packed.
How would you define the instructions for this to work?
Wed, Sep 13
Corrected the use of isCopyInstr.
I have done the first round of testing of the patch on graphics content, with LLPC PatchBufferOp disabled and small modification in LLPC to use ptr addrspace (7) in lieu of lgc.buffer.desc.to.ptr.
Tue, Sep 12
Thanks for the extra info; I understand the problem now: currently there seems to be no way to take advantage of the opsel bit to reuse the same destination matrix registers for two wmma instructions.
Mon, Sep 11
Delete only if all copies are dead, use isCopyInstr.
Fri, Sep 8
Thu, Sep 7
Thanks, that would avoid the regression. However, I still do not fully understand the failing mode - can you show the test case + extra code that triggers the issue?
Mon, Sep 4
Ping.
Tue, Aug 29
Thanks for working on this. Just a few high-level comments:
Mon, Aug 28
Reworked to check for copy.
Aug 25 2023
Ping - I think the only unresolved point is potential weakening of the generic check.
Rebased.
I am trying to understand the failing case better. Can the issue only happen with the extra patch with packing? Is the issue only with zeroinitializers (constant matrices), or it is just where the problem was found?
Aug 3 2023
I am open to suggestions how to handle this more gracefully.
The test case is dependent on AMDGPU-specific D154083, which rematerializes instructions with wide registers. I was not able to observe the erroneous behaviour without it, but should be possible to trigger it somehow.
Jul 28 2023
Thanks, AMD changes LGTM. I spotted this weirdness recently when staring at some remat code for AMDGPU. (It's not a serious issue for us, because the generic check is really conservative for our target, and we typically do not want to block the remat when the generic check returns true).
Jul 26 2023
Jul 21 2023
Added new case with the same reg used for use and def: %0.sub0_sub1:sgpr_256 = S_LOAD_DWORDX2_IMM %0.sub0_sub1:sgpr_256.
Jul 19 2023
Added more tests.
Ran extensive testing on graphics workloads, which uncovered some bugs. Added fixes and more tests for those interesting cases in D154816.
Jul 12 2023
Jul 10 2023
Note: I am aware the "DEFAULT" prefix is currently unused, but was not sure if the intention was for it to be used in the near future - so I am leaving it there for now.
- Addressed review comments.
- Relaxed check to include all invariant loads, not only dereferenceable ones.
- Rebased patch over the commit with new/changed tests.
- Updated MMO with new size in the shrinking path.
Jul 4 2023
Jun 30 2023
Jun 29 2023
Jun 6 2023
Having looked at some real-world graphics content on AMDGPU (with ScalarizeMinBits = 32), I can confirm the usefulness of this patch. I can see more packed instructions generated (for example v_pk_add_f16, v_pk_mul_f16, v_pk_fma_f16).
Yes - this is NFC, but paves the way for other changes.
Jun 2 2023
LGTM with a nit, but please wait for Matt's approval.
May 30 2023
May 15 2023
Apr 7 2023
Thanks for working on this. Just added a couple of nits.
Mar 30 2023
LGTM - looks cleaner than the version before the change.
Mar 16 2023
Mar 3 2023
Would be good to add a test case, but I do realize this can be tricky.
Mar 1 2023
We're seeing some regression coming from this change for AMDGPU backend.
Feb 27 2023
Feb 23 2023
(fixing the spelling error)
Feb 22 2023
Feb 14 2023
Feb 3 2023
Feb 2 2023
Will do extra testing.
Jan 25 2023
Dec 21 2022
Dec 20 2022
Dec 7 2022
Dec 6 2022
Updating patch to address two more issues detected by running tests in debug mode:
- wrong opcodes used in getVGPRSpillSaveOpcode in the original patch
- tests using insufficient spill slot size in my patch
Dec 5 2022
Thanks - the extra set of tests has uncovered another issue in code.
Dec 2 2022
Nov 22 2022
Ping.