Page MenuHomePhabricator

piotr (Piotr Sobczak)
User

Projects

User does not belong to any projects.

User Details

User Since
Dec 4 2018, 6:02 AM (185 w, 5 d)

Recent Activity

Thu, Jun 16

piotr added inline comments to D127894: AMDGPU: Fix invalid liveness after si-optimize-exec-masking-pre-ra.
Thu, Jun 16, 11:22 PM · Restricted Project, Restricted Project

Tue, May 31

piotr added a comment to D126389: [AMDGPU] Improve codegen of extractelement/insertelement in some cases.

Any performance numbers? The 8 element case was driven by a specific customer program and the performance of the cmp/select was better than movrel.

I don't know why that would be. Maybe the performance characteristics are different on GFX10+ compared to GFX9.

Also on GFX10+ sgpr usage does not affect occupancy, so perhaps the heuristic could be tweaked to make it more likely to use s_movrel (not v_movrel) on GFX10+.

I will try to get some performance numbers on specific games. Do you know if this performance problem was specific to an architecture?
Like Jay said, I could tweak the heuristic for this and only generate it for GFX10+.

All the tests were done on gfx9 flavors. It is also worth noting gfx9 does not have movrel, but uses s_set_gpr_idx_on instead, which may be the difference.

I.e. it may be linked to FeatureMovrel.

I measured the performance with/without the patch on the original game in which we spotted this issue.
It's a small improvement of about 0.5% in the current version compared to without the patch. This was measured on GFX1030.

It seems to be a little bit faster with the patch, I can't really test on gfx9 with this game.

If it's better to keep the initial version for gfx9, I could:

  • Only do this patch for GFX10+
  • Some flag?
Tue, May 31, 6:55 AM · Restricted Project, Restricted Project, Restricted Project

May 20 2022

piotr added a comment to D126064: [AMDGPU] Handle mandatory literals in isOperandLegal.

Just verified this patch fixes the rendering corruption caused by D114643 - thanks!

May 20 2022, 7:13 AM · Restricted Project, Restricted Project

May 17 2022

piotr added a comment to D114644: [AMDGPU] Aggressively fold immediates in SIShrinkInstructions.

I am strongly in favour of this change. On our target, the sgpr improvements that manifest themselves in reduced sgpr spilling outweighs the code size increase.

May 17 2022, 3:30 AM · Restricted Project, Restricted Project

May 13 2022

piotr added inline comments to D124884: [AMDGPU] Add intrinsics llvm.amdgcn.{raw|struct}.buffer.load.lds.
May 13 2022, 2:31 AM · Restricted Project, Restricted Project
piotr added a comment to D124450: [AMDGPU] Remove hasOneUse check from scalar select pattern.

Looks good to me - the extra s_cselect's generated are worth the complexity arising from this patch.

May 13 2022, 1:00 AM · Restricted Project, Restricted Project

Apr 29 2022

piotr added a comment to D124450: [AMDGPU] Remove hasOneUse check from scalar select pattern.

The biggest headache comes from the fact that during moveToVALU when S_CMP gets converted to V_CMP the users of SCC need to be handled properly otherwise you end up with a weird copy from SCC. I think this is handled right now by adding an extra copy from VCC to SCC to make the connection between V_CMP and S_CSELECT until it is time for the handling of S_CSELECT. This gets more tricky when there are more uses of SCC I guess.

Apr 29 2022, 8:40 AM · Restricted Project, Restricted Project

Apr 28 2022

piotr added a comment to D124232: [AMDGPU] Use d16 flag for image.sample instructions.

I wanted to add this combine before, but I don’t think there is a way to add d16 to an instruction without potentially breaking the code.
The reason is, when an image_sample has the d16 flag enabled, it will use f32→f16 truncation or i32→i16 truncation, depending on the texture format in the descriptor.

Combining image_sample+fptrunc to image_sample d16 works fine for float textures, but I assume we don’t know at compile time if a texture is an integer or float texture.
The application may interpret stored values as float and does an fptrunc, but the texture is actually defined as an integer texture, so the hardware uses an integer trunc instead, giving different results.

Apr 28 2022, 1:08 AM · Restricted Project, Restricted Project

Apr 27 2022

piotr accepted D124232: [AMDGPU] Use d16 flag for image.sample instructions.

LGTM

Apr 27 2022, 12:27 AM · Restricted Project, Restricted Project

Apr 26 2022

piotr added a comment to D124450: [AMDGPU] Remove hasOneUse check from scalar select pattern.

I'd love this to go in, but when I added the hasOneUse() check it was certainly needed. If my old notes serve me well there was a crash in ctlz.ll test and I concluded this check was needed to avoid some shenanigans in the si-fix-sgpr-copies. I need to double check if the issue has been fixed or is just hidden.

Apr 26 2022, 9:19 AM · Restricted Project, Restricted Project

Apr 25 2022

piotr added a reverting change for rGd1762fc454c0: [AMDGPU] Use d16 flag for image.sample instructions: rGc6afbdb5d2a0: Revert "[AMDGPU] Use d16 flag for image.sample instructions".
Apr 25 2022, 8:21 AM · Restricted Project, Restricted Project
piotr committed rGc6afbdb5d2a0: Revert "[AMDGPU] Use d16 flag for image.sample instructions" (authored by piotr).
Revert "[AMDGPU] Use d16 flag for image.sample instructions"
Apr 25 2022, 8:21 AM · Restricted Project, Restricted Project
piotr added a reverting change for D124232: [AMDGPU] Use d16 flag for image.sample instructions: rGc6afbdb5d2a0: Revert "[AMDGPU] Use d16 flag for image.sample instructions".
Apr 25 2022, 8:21 AM · Restricted Project, Restricted Project

Mar 22 2022

piotr added a comment to D115747: [AMDGPU] Flush the vmcnt counter in loop preheader when necessary.

@piotr I ran compile time testing and the patch has no significant impact. Worst case is 1.1% and is the only one above 1%. Average is below 0.1%.

Mar 22 2022, 11:57 PM · Restricted Project, Restricted Project, Restricted Project
piotr added a comment to D115747: [AMDGPU] Flush the vmcnt counter in loop preheader when necessary.

Do you have any data on the compile time impact?

Mar 22 2022, 12:52 AM · Restricted Project, Restricted Project, Restricted Project

Mar 1 2022

piotr accepted D120704: [AMDGPU] Handle legacy multiply-accumulate opcodes in convertToThreeAddress.
Mar 1 2022, 7:15 AM · Restricted Project
piotr accepted D120703: [AMDGPU] Disentangle MFMA handling in convertToThreeAddress. NFC..
Mar 1 2022, 7:08 AM · Restricted Project

Feb 18 2022

piotr accepted D120023: [AMDGPU] Return better Changed status from SIFoldOperands.

LGTM

Feb 18 2022, 2:30 AM · Restricted Project
piotr added inline comments to D120023: [AMDGPU] Return better Changed status from SIFoldOperands.
Feb 18 2022, 2:12 AM · Restricted Project
piotr accepted D120025: [AMDGPU] Return better Changed status from SILowerControlFlow.

LGTM

Feb 18 2022, 2:09 AM · Restricted Project
piotr added inline comments to D120025: [AMDGPU] Return better Changed status from SILowerControlFlow.
Feb 18 2022, 2:04 AM · Restricted Project
piotr accepted D120024: [AMDGPU] Return better Changed status from SIOptimizeExecMasking.

LGTM

Feb 18 2022, 1:56 AM · Restricted Project

Feb 17 2022

piotr accepted D119945: [AMDGPU] Return better Changed status from SIAnnotateControlFlow.
Feb 17 2022, 12:52 AM · Restricted Project
piotr added inline comments to D119954: [CodeGen] Return better Changed status from PostRAHazardRecognizer.
Feb 17 2022, 12:51 AM · Restricted Project

Feb 16 2022

piotr added inline comments to D119945: [AMDGPU] Return better Changed status from SIAnnotateControlFlow.
Feb 16 2022, 12:08 PM · Restricted Project
piotr added inline comments to D119954: [CodeGen] Return better Changed status from PostRAHazardRecognizer.
Feb 16 2022, 12:04 PM · Restricted Project
piotr accepted D119943: [AMDGPU] Return better Changed status from AMDGPUAnnotateUniformValues.

LGTM

Feb 16 2022, 11:58 AM · Restricted Project
piotr accepted D119944: [AMDGPU] Return better Changed status from AMDGPUPerfHintAnalysis.

LGTM

Feb 16 2022, 11:57 AM · Restricted Project
piotr accepted D119946: [AMDGPU] Return better Changed status from SILowerI1Copies.

LGTM

Feb 16 2022, 11:51 AM · Restricted Project

Feb 7 2022

piotr added a comment to D119006: [AMDGPU] SILoadStoreOptimizer: avoid unbounded register pressure increases.

So that indicates an improvement in the average vgpr count.

Yes.

Btw, does the patch result in a noticeably smaller number of merges on average?

No, to my surprise there was absolutely no difference in the amount of merging in any of the 10,000 shaders. I checked by diffing the instruction mix for each shader. The only differences were in VALU and SALU instructions.

Feb 7 2022, 11:48 PM · Restricted Project
piotr added a comment to D119006: [AMDGPU] SILoadStoreOptimizer: avoid unbounded register pressure increases.

So that indicates an improvement in the average vgpr count.

Feb 7 2022, 9:19 AM · Restricted Project

Feb 4 2022

piotr accepted D118994: [AMDGPU] SILoadStoreOptimizer: rewrite checkAndPrepareMerge. NFCI..

LGTM

Feb 4 2022, 8:45 AM · Restricted Project

Jan 28 2022

piotr accepted D118384: [AMDGPU] SILoadStoreOptimizer: break lists on instructions with side effects.

LGTM

Jan 28 2022, 3:53 AM · Restricted Project

Jan 27 2022

piotr accepted D118267: [AMDGPU] SILoadStoreOptimizer: Allow merging across a swizzled access.

LGTM

Jan 27 2022, 6:34 AM · Restricted Project

Jan 19 2022

piotr committed rG8dfb417e67e3: [AMDGPU] Fix missing waitcnt issue (authored by piotr).
[AMDGPU] Fix missing waitcnt issue
Jan 19 2022, 1:55 AM
piotr closed D117544: [AMDGPU] Fix missing waitcnt issue.
Jan 19 2022, 1:55 AM · Restricted Project

Jan 18 2022

piotr updated the diff for D117544: [AMDGPU] Fix missing waitcnt issue.

Remove references to "waw" as the issue can also trigger in other scenarios, as Jay pointed out to me (thanks).

Jan 18 2022, 4:53 AM · Restricted Project
piotr added a comment to D117544: [AMDGPU] Fix missing waitcnt issue.

In our usual compile-time tests it shows 0.056% degradation on average, worst case 0.9%.

Jan 18 2022, 1:27 AM · Restricted Project
piotr added inline comments to D117544: [AMDGPU] Fix missing waitcnt issue.
Jan 18 2022, 1:10 AM · Restricted Project
piotr added reviewers for D117544: [AMDGPU] Fix missing waitcnt issue: foad, arsenm, rampitec, kerbowa, bsaleil.
Jan 18 2022, 12:23 AM · Restricted Project
piotr requested review of D117544: [AMDGPU] Fix missing waitcnt issue.
Jan 18 2022, 12:19 AM · Restricted Project

Nov 3 2021

piotr added a comment to D108830: [AMDGPU] Propagate defining src reg for AGPR to AGPR Copys.

For the record, D113005 indeed fixes the CTS issue I was seeing with D108830. Thanks.

Nov 3 2021, 2:09 AM · Restricted Project
piotr committed rG03961709edd1: [InstCombine] Extend pattern to replace shuffle's insertelement operand (authored by piotr).
[InstCombine] Extend pattern to replace shuffle's insertelement operand
Nov 3 2021, 1:44 AM
piotr closed D112318: [InstCombine] Extend pattern to replace shuffle's insertelement operand.
Nov 3 2021, 1:44 AM · Restricted Project

Nov 2 2021

piotr added a comment to D112318: [InstCombine] Extend pattern to replace shuffle's insertelement operand.

Ping.

Nov 2 2021, 12:02 AM · Restricted Project

Oct 26 2021

piotr accepted D101825: [AMDGPU] Use standard MachineBasicBlock::getFallThrough method. NFCI..

LGTM.

Oct 26 2021, 3:06 AM · Restricted Project

Oct 22 2021

piotr added reviewers for D112318: [InstCombine] Extend pattern to replace shuffle's insertelement operand: spatel, efriedma, nlopes, aqjune, lebedev.ri, foad.
Oct 22 2021, 8:19 AM · Restricted Project
piotr added a comment to D112318: [InstCombine] Extend pattern to replace shuffle's insertelement operand.

Tests pre-committed in 7457fe3dd44a0bc4b0296149c48188accefda5fa.

Oct 22 2021, 8:17 AM · Restricted Project
piotr requested review of D112318: [InstCombine] Extend pattern to replace shuffle's insertelement operand.
Oct 22 2021, 8:17 AM · Restricted Project
piotr committed rG7457fe3dd44a: [InstCombine][NFC] Precommit new tests (authored by piotr).
[InstCombine][NFC] Precommit new tests
Oct 22 2021, 8:16 AM

Oct 18 2021

piotr committed rGd86992100452: [AMDGPU] Add patterns for i8/i16 local atomic load/store (authored by piotr).
[AMDGPU] Add patterns for i8/i16 local atomic load/store
Oct 18 2021, 2:25 AM
piotr closed D111869: [AMDGPU] Add patterns for i8/i16 local atomic load/store.
Oct 18 2021, 2:25 AM · Restricted Project

Oct 15 2021

piotr updated the summary of D111869: [AMDGPU] Add patterns for i8/i16 local atomic load/store.
Oct 15 2021, 3:55 AM · Restricted Project
piotr updated the diff for D111869: [AMDGPU] Add patterns for i8/i16 local atomic load/store.

Added -global-isel-abort=0 and restored the tests.

Oct 15 2021, 3:55 AM · Restricted Project
piotr added reviewers for D111869: [AMDGPU] Add patterns for i8/i16 local atomic load/store: arsenm, rampitec, foad.
Oct 15 2021, 2:33 AM · Restricted Project
piotr requested review of D111869: [AMDGPU] Add patterns for i8/i16 local atomic load/store.
Oct 15 2021, 2:32 AM · Restricted Project

Oct 14 2021

piotr added a comment to D111646: [AMDGPU] Enable load clustering in the post-RA scheduler.

Sounds good to me. The runtime improvement from clustering is notoriously difficult to assess, but your static data shows some potential benefit.

Oct 14 2021, 12:35 AM · Restricted Project

Sep 30 2021

piotr added inline comments to D108830: [AMDGPU] Propagate defining src reg for AGPR to AGPR Copys.
Sep 30 2021, 8:03 AM · Restricted Project

Sep 23 2021

piotr committed rG2ac53fffaeda: [AMDGPU] Avoid processing functions in amdgpu-propagate-attributes pass for… (authored by piotr).
[AMDGPU] Avoid processing functions in amdgpu-propagate-attributes pass for…
Sep 23 2021, 7:47 AM
piotr closed D109961: [AMDGPU] Avoid processing functions in amdgpu-propagate-attributes pass for shaders.
Sep 23 2021, 7:47 AM · Restricted Project

Sep 21 2021

piotr added a reviewer for D109961: [AMDGPU] Avoid processing functions in amdgpu-propagate-attributes pass for shaders: bsaleil.
Sep 21 2021, 8:04 AM · Restricted Project
piotr accepted D109900: [AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC.
Sep 21 2021, 1:05 AM · Restricted Project

Sep 20 2021

piotr added a comment to D109961: [AMDGPU] Avoid processing functions in amdgpu-propagate-attributes pass for shaders.

I was thinking about this, but not sure what the test would demonstrate, as the patch just limits the number of cases for which this pass works. With this patch, there are no test changes in the existing set of tests, as expected.

Sep 20 2021, 7:53 AM · Restricted Project

Sep 17 2021

piotr retitled D109961: [AMDGPU] Avoid processing functions in amdgpu-propagate-attributes pass for shaders from [AMDGPU] Avoid calling amdgpu-propagate-attributes pass for shaders to [AMDGPU] Avoid processing functions in amdgpu-propagate-attributes pass for shaders.
Sep 17 2021, 5:54 AM · Restricted Project
piotr added reviewers for D109961: [AMDGPU] Avoid processing functions in amdgpu-propagate-attributes pass for shaders: arsenm, rampitec, JonChesterfield.
Sep 17 2021, 5:49 AM · Restricted Project
piotr requested review of D109961: [AMDGPU] Avoid processing functions in amdgpu-propagate-attributes pass for shaders.
Sep 17 2021, 5:48 AM · Restricted Project
piotr added a comment to D109900: [AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC.

The patch looks good to me (modulo the lint warnings).

Sep 17 2021, 12:26 AM · Restricted Project

Sep 14 2021

piotr added a comment to D109754: AMDGPU: Use -1/0 when copying from SCC to SGPR.

The change makes sense to me in general.

Sep 14 2021, 5:34 AM · Restricted Project

Sep 7 2021

piotr added a comment to D109159: [amdgpu] Enable selection of `s_cselect_b64`..

Just for the record, see also this comment from D82370:

Additionally, remove pattern for selects with 64-bit
inputs, which are rare, because handling them properly
requires more thought.

This reverted part of D81925, which originally added both 32- and 64-bit patterns.

In fact, we found that 64-bit select is not rare, partially due to the use of 64-bit pointers (tend to be uniform in most cases.) This patch is developed to address that pending issue in D81925.

Sep 7 2021, 9:03 AM · Restricted Project

Sep 2 2021

piotr committed rG30d6c39bca6c: [AMDGPU] Add merging into S_BUFFER_LOAD_DWORDX8_IMM (authored by piotr).
[AMDGPU] Add merging into S_BUFFER_LOAD_DWORDX8_IMM
Sep 2 2021, 7:28 AM
piotr closed D108909: [AMDGPU] Add merging into S_BUFFER_LOAD_DWORDX8_IMM.
Sep 2 2021, 7:28 AM · Restricted Project

Aug 30 2021

piotr added reviewers for D108909: [AMDGPU] Add merging into S_BUFFER_LOAD_DWORDX8_IMM: arsenm, foad.
Aug 30 2021, 2:19 AM · Restricted Project
piotr updated the diff for D108909: [AMDGPU] Add merging into S_BUFFER_LOAD_DWORDX8_IMM.

Add test.

Aug 30 2021, 2:18 AM · Restricted Project
piotr requested review of D108909: [AMDGPU] Add merging into S_BUFFER_LOAD_DWORDX8_IMM.
Aug 30 2021, 2:17 AM · Restricted Project

Jun 29 2021

piotr committed rGf38a8b54ea31: [AMDGPU] Fix 224-bit spills (authored by piotr).
[AMDGPU] Fix 224-bit spills
Jun 29 2021, 8:53 AM
piotr closed D105109: Fix 224-bit spills.
Jun 29 2021, 8:53 AM · Restricted Project
piotr updated the diff for D105109: Fix 224-bit spills.

Test case added.

Jun 29 2021, 8:21 AM · Restricted Project
piotr added reviewers for D105109: Fix 224-bit spills: rampitec, arsenm, foad, critson.
Jun 29 2021, 6:14 AM · Restricted Project
piotr requested review of D105109: Fix 224-bit spills.
Jun 29 2021, 6:11 AM · Restricted Project

Jun 14 2021

piotr committed rGe0c382a9d5a0: [AMDGPU] Limit runs of fixLdsBranchVmemWARHazard (authored by piotr).
[AMDGPU] Limit runs of fixLdsBranchVmemWARHazard
Jun 14 2021, 1:31 PM
piotr closed D104219: [AMDGPU] Limit runs of fixLdsBranchVmemWARHazard.
Jun 14 2021, 1:31 PM · Restricted Project
piotr updated the summary of D104219: [AMDGPU] Limit runs of fixLdsBranchVmemWARHazard.
Jun 14 2021, 5:07 AM · Restricted Project
piotr added reviewers for D104219: [AMDGPU] Limit runs of fixLdsBranchVmemWARHazard: arsenm, rampitec, foad, bsaleil.
Jun 14 2021, 5:06 AM · Restricted Project
piotr requested review of D104219: [AMDGPU] Limit runs of fixLdsBranchVmemWARHazard.
Jun 14 2021, 5:05 AM · Restricted Project

May 12 2021

piotr added a comment to D101177: [AMDGPU] Skip invariant loads when avoiding WAR conflicts.

Yes, based on Matt's last comment, there is still a potential problem even though my patch significantly reduces the likelihood of it occurring.

May 12 2021, 6:11 AM · Restricted Project
piotr added a comment to D101177: [AMDGPU] Skip invariant loads when avoiding WAR conflicts.

Assert removed in a4db7025a9762c568c7bc9fdd3c64f4a60e31cfc.

May 12 2021, 5:54 AM · Restricted Project
piotr committed rGa4db7025a976: [AMDGPU] Remove assert (authored by piotr).
[AMDGPU] Remove assert
May 12 2021, 5:54 AM
piotr committed rG68137ef5682f: [AMDGPU] Skip invariant loads when avoiding WAR conflicts (authored by piotr).
[AMDGPU] Skip invariant loads when avoiding WAR conflicts
May 12 2021, 1:58 AM
piotr closed D101177: [AMDGPU] Skip invariant loads when avoiding WAR conflicts.
May 12 2021, 1:58 AM · Restricted Project
piotr updated the diff for D101177: [AMDGPU] Skip invariant loads when avoiding WAR conflicts.

Added the assert. The assert seems in order here - no hits in the lit tests or Vulkan CTS. There would have been hits in 194 lit tests if the assert had been placed here without the isInvariant check, which somewhat proves usefulness of the patch.

May 12 2021, 1:22 AM · Restricted Project

May 11 2021

piotr committed rG09fe84abb4ee: [AMDGPU] Move code sinking before structurizer (authored by piotr).
[AMDGPU] Move code sinking before structurizer
May 11 2021, 5:35 AM
piotr closed D101115: [AMDGPU] Move code sinking before structurizer.
May 11 2021, 5:34 AM · Restricted Project

May 10 2021

piotr updated the diff for D101177: [AMDGPU] Skip invariant loads when avoiding WAR conflicts.

Added mir test.

May 10 2021, 2:08 AM · Restricted Project

May 7 2021

piotr retitled D101177: [AMDGPU] Skip invariant loads when avoiding WAR conflicts from [AMDGPU] Avoid adding nullptr keys to hash table to [AMDGPU] Skip invariant loads when avoiding WAR conflicts.
May 7 2021, 2:04 PM · Restricted Project
piotr updated the diff for D101177: [AMDGPU] Skip invariant loads when avoiding WAR conflicts.

Re-purposing patch to not add invariant loads to the map.

May 7 2021, 2:03 PM · Restricted Project

May 5 2021

piotr accepted D101966: [AMDGPU] Fix WQM failure with single block inactive demote.
May 5 2021, 11:47 PM · Restricted Project

Apr 26 2021

piotr added a reviewer for D101177: [AMDGPU] Skip invariant loads when avoiding WAR conflicts: alex-t.
Apr 26 2021, 2:39 AM · Restricted Project
piotr added inline comments to D101177: [AMDGPU] Skip invariant loads when avoiding WAR conflicts.
Apr 26 2021, 2:39 AM · Restricted Project

Apr 23 2021

piotr added reviewers for D101177: [AMDGPU] Skip invariant loads when avoiding WAR conflicts: foad, arsenm, rampitec.
Apr 23 2021, 9:22 AM · Restricted Project
piotr requested review of D101177: [AMDGPU] Skip invariant loads when avoiding WAR conflicts.
Apr 23 2021, 9:21 AM · Restricted Project