Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

skc7 (krishna chaitanya sankisa)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 1 2021, 3:10 AM (99 w, 5 d)

Recent Activity

Aug 17 2023

skc7 updated the diff for D158150: [AMDGPU] Add dynamic LDS size implicit kernel argument to v5.

Update alignment for int32 type lds_size argument

Aug 17 2023, 10:06 PM · Restricted Project, Restricted Project
skc7 retitled D158150: [AMDGPU] Add dynamic LDS size implicit kernel argument to v5 from [WIP] Add dynamic LDS size implicit argument to v5 to [AMDGPU] Add dynamic LDS size implicit argument to v5.
Aug 17 2023, 7:45 AM · Restricted Project, Restricted Project
skc7 added reviewers for D158150: [AMDGPU] Add dynamic LDS size implicit kernel argument to v5: b-sumner, arsenm.
Aug 17 2023, 4:12 AM · Restricted Project, Restricted Project
skc7 updated subscribers of D158150: [AMDGPU] Add dynamic LDS size implicit kernel argument to v5.

Why do we need this? I thought this was already available in the dispatch packet, you just need to subtract the statically known size

Aug 17 2023, 4:11 AM · Restricted Project, Restricted Project
skc7 updated the diff for D158150: [AMDGPU] Add dynamic LDS size implicit kernel argument to v5.

Make lds size argument 32 bit int type.

Aug 17 2023, 4:02 AM · Restricted Project, Restricted Project
skc7 retitled D158150: [AMDGPU] Add dynamic LDS size implicit kernel argument to v5 from [WIP] Add dynamic LDS implicit argument to v5 to [WIP] Add dynamic LDS size implicit argument to v5.
Aug 17 2023, 2:05 AM · Restricted Project, Restricted Project
skc7 requested review of D158150: [AMDGPU] Add dynamic LDS size implicit kernel argument to v5.
Aug 17 2023, 12:19 AM · Restricted Project, Restricted Project

Jul 19 2023

skc7 abandoned D134423: [AMDGPU] Fix vgpr2sgpr copy analysis to check scalar operands of buffer instructions use scalar registers..
Jul 19 2023, 10:28 PM · Restricted Project, Restricted Project

Jun 5 2023

skc7 abandoned D143335: [AMDGPU] Use instruction order in machine function to process workList of moveToVALU.
Jun 5 2023, 8:09 AM · Restricted Project, Restricted Project

May 30 2023

skc7 updated the summary of D148142: [AMDGPU] Update Subtarget isWave32 method to ignore the wave32 feature pre-gfx9.
May 30 2023, 9:49 PM · Restricted Project, Restricted Project
skc7 updated the diff for D148142: [AMDGPU] Update Subtarget isWave32 method to ignore the wave32 feature pre-gfx9.

Update test.

May 30 2023, 9:49 PM · Restricted Project, Restricted Project
skc7 updated the diff for D148142: [AMDGPU] Update Subtarget isWave32 method to ignore the wave32 feature pre-gfx9.
May 30 2023, 2:15 AM · Restricted Project, Restricted Project

May 29 2023

skc7 added a comment to D148142: [AMDGPU] Update Subtarget isWave32 method to ignore the wave32 feature pre-gfx9.

If we’re going to do this, no changes should be needed in SIFrameLowering. The wavesize set in the subtarget constructor would be adjusted to consistently apply this everywhere. The prolog code is a very minor piece of this

May 29 2023, 3:45 AM · Restricted Project, Restricted Project
skc7 updated the diff for D148142: [AMDGPU] Update Subtarget isWave32 method to ignore the wave32 feature pre-gfx9.
May 29 2023, 2:24 AM · Restricted Project, Restricted Project

May 19 2023

skc7 committed rG663bb5a5f7ff: [AMDGPU] Remove function if FeatureWavefrontSize 32 is not supported on current… (authored by skc7).
[AMDGPU] Remove function if FeatureWavefrontSize 32 is not supported on current…
May 19 2023, 11:21 AM · Restricted Project, Restricted Project
skc7 closed D148906: [AMDGPU] Remove function if FeatureWavefrontSize 32 is not supported on current GPU.
May 19 2023, 11:21 AM · Restricted Project, Restricted Project
skc7 added inline comments to D148906: [AMDGPU] Remove function if FeatureWavefrontSize 32 is not supported on current GPU.
May 19 2023, 9:36 AM · Restricted Project, Restricted Project

May 11 2023

skc7 updated the diff for D148906: [AMDGPU] Remove function if FeatureWavefrontSize 32 is not supported on current GPU.

Rebase.

May 11 2023, 10:42 PM · Restricted Project, Restricted Project

May 5 2023

skc7 updated the diff for D148906: [AMDGPU] Remove function if FeatureWavefrontSize 32 is not supported on current GPU.

Updated patch as per feedback from @Pierre-vh

May 5 2023, 9:02 AM · Restricted Project, Restricted Project

May 4 2023

skc7 updated the diff for D148906: [AMDGPU] Remove function if FeatureWavefrontSize 32 is not supported on current GPU.

expandImpliedFeatures gets all the features implied by current GPU. But wavefrontsize32 and 64 are not part of the feature list for gfx10 and above targets in AMDGPU.td. AFAIU, It is assumed those subtarget supports both the features if they are not part of the list. So GPUFeatureBits cannot be relied on to query FeatureWavefrontSize32/64.

May 4 2023, 8:10 PM · Restricted Project, Restricted Project

May 1 2023

skc7 added inline comments to D148906: [AMDGPU] Remove function if FeatureWavefrontSize 32 is not supported on current GPU.
May 1 2023, 1:35 AM · Restricted Project, Restricted Project
skc7 updated the diff for D148906: [AMDGPU] Remove function if FeatureWavefrontSize 32 is not supported on current GPU.

Remove wavefrontsize64 function if compiling in wave32 mode.

May 1 2023, 1:24 AM · Restricted Project, Restricted Project

Apr 29 2023

skc7 updated the diff for D148906: [AMDGPU] Remove function if FeatureWavefrontSize 32 is not supported on current GPU.

Updated patch to delete the function if FeatureWavefrontSize32 is not supported by current GPU.

Apr 29 2023, 2:57 AM · Restricted Project, Restricted Project

Apr 27 2023

skc7 committed rGe016fb57b353: [AMDGPU] Legalize soffset of buffer instructions. Use Waterfall loop logic. (authored by skc7).
[AMDGPU] Legalize soffset of buffer instructions. Use Waterfall loop logic.
Apr 27 2023, 7:08 AM · Restricted Project, Restricted Project
skc7 closed D141030: [AMDGPU] Legalize soffset of buffer instruction. Use Waterfall loop logic..
Apr 27 2023, 7:08 AM · Restricted Project, Restricted Project

Apr 25 2023

skc7 updated the diff for D141030: [AMDGPU] Legalize soffset of buffer instruction. Use Waterfall loop logic..

Rebase.

Apr 25 2023, 4:49 AM · Restricted Project, Restricted Project

Apr 21 2023

skc7 requested review of D148906: [AMDGPU] Remove function if FeatureWavefrontSize 32 is not supported on current GPU.
Apr 21 2023, 1:56 AM · Restricted Project, Restricted Project

Apr 17 2023

skc7 added inline comments to D141030: [AMDGPU] Legalize soffset of buffer instruction. Use Waterfall loop logic..
Apr 17 2023, 9:06 AM · Restricted Project, Restricted Project

Apr 14 2023

skc7 updated the diff for D141030: [AMDGPU] Legalize soffset of buffer instruction. Use Waterfall loop logic..

Port MBUF related tests from global-isel containing all the test combinations.

Apr 14 2023, 1:46 AM · Restricted Project, Restricted Project

Apr 12 2023

skc7 added a comment to D147168: [AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU.

@foad As per your feedback, changes to sub.ll test have been made and merged upstream : LINK

Apr 12 2023, 9:51 AM · Restricted Project, Restricted Project
skc7 requested review of D148142: [AMDGPU] Update Subtarget isWave32 method to ignore the wave32 feature pre-gfx9.
Apr 12 2023, 9:18 AM · Restricted Project, Restricted Project

Apr 11 2023

skc7 committed rG97f8d6b2ecb2: [AMDGPU][NFC] Regenerate test checks for sub.ll (authored by skc7).
[AMDGPU][NFC] Regenerate test checks for sub.ll
Apr 11 2023, 5:27 AM · Restricted Project, Restricted Project

Apr 10 2023

skc7 committed rG635c725b3031: [AMDGPU][NFC] Regenerate test checks for merge-tbuffer.mir (authored by skc7).
[AMDGPU][NFC] Regenerate test checks for merge-tbuffer.mir
Apr 10 2023, 6:24 AM · Restricted Project, Restricted Project
skc7 updated the diff for D141030: [AMDGPU] Legalize soffset of buffer instruction. Use Waterfall loop logic..

Rebase. Update test.

Apr 10 2023, 3:06 AM · Restricted Project, Restricted Project

Apr 9 2023

skc7 committed rGb434051dc83d: [AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU (authored by skc7).
[AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU
Apr 9 2023, 11:44 PM · Restricted Project, Restricted Project
skc7 closed D147168: [AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU.
Apr 9 2023, 11:43 PM · Restricted Project, Restricted Project

Apr 6 2023

skc7 updated the summary of D147168: [AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU.
Apr 6 2023, 2:38 AM · Restricted Project, Restricted Project

Apr 4 2023

skc7 updated the diff for D147168: [AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU.

Changes done as per review by @arsenm

Apr 4 2023, 8:29 AM · Restricted Project, Restricted Project
skc7 added a comment to D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.

Hi @nikic. I have made the changes that were suggested previously. Could you please review?

Apr 4 2023, 4:15 AM · Restricted Project, Restricted Project
skc7 added inline comments to D147168: [AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU.
Apr 4 2023, 4:11 AM · Restricted Project, Restricted Project
skc7 updated the diff for D147168: [AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU.

Rebase. Update test control-flow-fastregalloc.ll

Apr 4 2023, 3:58 AM · Restricted Project, Restricted Project

Apr 2 2023

skc7 updated the diff for D147168: [AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU.

Rebase.

Apr 2 2023, 10:39 PM · Restricted Project, Restricted Project
skc7 updated the diff for D147168: [AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU.

Changes done as per @arsenm review.

Apr 2 2023, 10:31 PM · Restricted Project, Restricted Project

Mar 30 2023

skc7 updated the diff for D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.

Rebase

Mar 30 2023, 3:18 AM · Restricted Project, Restricted Project
skc7 updated the diff for D147168: [AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU.

Fix mul.ll test failure.

Mar 30 2023, 1:27 AM · Restricted Project, Restricted Project

Mar 29 2023

skc7 requested review of D147168: [AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU.
Mar 29 2023, 9:28 AM · Restricted Project, Restricted Project

Mar 16 2023

skc7 requested review of D146223: [WIP] Use RPOT to process worklist of moveToVALU.
Mar 16 2023, 6:22 AM · Restricted Project, Restricted Project

Mar 10 2023

skc7 added inline comments to D143335: [AMDGPU] Use instruction order in machine function to process workList of moveToVALU.
Mar 10 2023, 8:06 PM · Restricted Project, Restricted Project

Feb 28 2023

skc7 updated the diff for D143335: [AMDGPU] Use instruction order in machine function to process workList of moveToVALU.

Use ReversePostOrderTraversal list to compare machine instructions from different basic blocks.

Feb 28 2023, 4:57 AM · Restricted Project, Restricted Project

Feb 21 2023

skc7 added inline comments to D143335: [AMDGPU] Use instruction order in machine function to process workList of moveToVALU.
Feb 21 2023, 4:14 AM · Restricted Project, Restricted Project

Feb 13 2023

skc7 added inline comments to D143335: [AMDGPU] Use instruction order in machine function to process workList of moveToVALU.
Feb 13 2023, 8:36 AM · Restricted Project, Restricted Project
skc7 updated the diff for D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.

Rebase. Ping

Feb 13 2023, 4:00 AM · Restricted Project, Restricted Project

Feb 10 2023

skc7 added inline comments to D143335: [AMDGPU] Use instruction order in machine function to process workList of moveToVALU.
Feb 10 2023, 2:12 AM · Restricted Project, Restricted Project

Feb 9 2023

skc7 updated the diff for D143335: [AMDGPU] Use instruction order in machine function to process workList of moveToVALU.

Introduce SIInstrWorklist. It has a std::set with comparison operator to store instructions as per order in machine function. This tries to solve the previous issue with sorting the vector in every iteration.

Feb 9 2023, 3:48 AM · Restricted Project, Restricted Project

Feb 5 2023

skc7 requested review of D143335: [AMDGPU] Use instruction order in machine function to process workList of moveToVALU.
Feb 5 2023, 2:23 AM · Restricted Project, Restricted Project

Jan 21 2023

skc7 added a comment to D141030: [AMDGPU] Legalize soffset of buffer instruction. Use Waterfall loop logic..

I think this will mishandle the case where both the SRD and the soffset are VGPRs. You need to handle both at the same time in one waterfall loop (this should show up if your tests used a meaningful SRD). You can also just look into the globalisel tests for these intrinsics, they test all the permutations already

Jan 21 2023, 10:44 AM · Restricted Project, Restricted Project
skc7 updated the diff for D141030: [AMDGPU] Legalize soffset of buffer instruction. Use Waterfall loop logic..

Make changes to legalize soffset and rsrc together.

Jan 21 2023, 10:32 AM · Restricted Project, Restricted Project

Jan 16 2023

skc7 added inline comments to D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.
Jan 16 2023, 8:20 PM · Restricted Project, Restricted Project
skc7 updated the diff for D141030: [AMDGPU] Legalize soffset of buffer instruction. Use Waterfall loop logic..

Rebase

Jan 16 2023, 8:16 PM · Restricted Project, Restricted Project

Jan 10 2023

skc7 updated the diff for D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.

Rebase

Jan 10 2023, 10:21 PM · Restricted Project, Restricted Project

Jan 9 2023

skc7 retitled D141030: [AMDGPU] Legalize soffset of buffer instruction. Use Waterfall loop logic. from [WIP][AMDGPU] Legalize soffset of buffer instruction. Use Waterfall loop logic. to [AMDGPU] Legalize soffset of buffer instruction. Use Waterfall loop logic..
Jan 9 2023, 3:01 AM · Restricted Project, Restricted Project
skc7 updated the diff for D141030: [AMDGPU] Legalize soffset of buffer instruction. Use Waterfall loop logic..

Rebase

Jan 9 2023, 2:56 AM · Restricted Project, Restricted Project

Jan 5 2023

skc7 updated the diff for D141030: [AMDGPU] Legalize soffset of buffer instruction. Use Waterfall loop logic..

Use loadMBUFScalarOperandFromVGPR to build waterfall loop for soffset and srsrc.

Jan 5 2023, 8:58 AM · Restricted Project, Restricted Project

Jan 4 2023

skc7 requested review of D141030: [AMDGPU] Legalize soffset of buffer instruction. Use Waterfall loop logic..
Jan 4 2023, 9:48 PM · Restricted Project, Restricted Project

Jan 2 2023

skc7 added inline comments to D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.
Jan 2 2023, 1:11 AM · Restricted Project, Restricted Project
skc7 updated the diff for D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.

Remove check for src array type.
Drop last zero from resulting NewIndices after matching the types.
Drop triple from test. Updated test checks with update_test_checks.py.
Added new tests for struct of array type and to check new zero index in resulting gep.

Jan 2 2023, 1:08 AM · Restricted Project, Restricted Project

Dec 23 2022

skc7 added a comment to D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.

I think in terms of general approach, the better way to handle this would be to relax the Src->getResultElementType() != GEP.getSourceElementType() check to allow the transform if the types can be made to match by adding additional zero indices.

(As a meta note, I think we'd probably be better off considering GEPs with multiple dynamic indices to be non-canonical, and try to split them up instead, e.g. because this allows LICM of an invariant part. But I guess that's not our current stance. It would be interesting to hear where you have found this merging to be beneficial.)

Dec 23 2022, 8:23 AM · Restricted Project, Restricted Project

Dec 15 2022

skc7 updated the diff for D139817: [AMDGPU] Legalize soffset of buffer instruction.

Rebase. Use poison for srsrc operands.

Dec 15 2022, 2:31 AM · Restricted Project, Restricted Project
skc7 updated the diff for D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.

Rebase. Ping

Dec 15 2022, 12:57 AM · Restricted Project, Restricted Project

Dec 13 2022

skc7 updated the diff for D139817: [AMDGPU] Legalize soffset of buffer instruction.

Removed inreg to params of tests. Made srsrc operand undef.

Dec 13 2022, 1:31 AM · Restricted Project, Restricted Project

Dec 12 2022

skc7 updated the diff for D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.

Rebase. Use -passes=instcombine in test.

Dec 12 2022, 9:01 PM · Restricted Project, Restricted Project
skc7 requested review of D139817: [AMDGPU] Legalize soffset of buffer instruction.
Dec 12 2022, 1:15 AM · Restricted Project, Restricted Project

Dec 9 2022

skc7 updated the diff for D134423: [AMDGPU] Fix vgpr2sgpr copy analysis to check scalar operands of buffer instructions use scalar registers..

Made changes to only identify copy and its result used by soffset of MUBUF/MTBUF. needToBeConvertedToVALU returns false if such pattern is found.
This also fixes vgpr-descriptor-waterfall-loop-idom-update.ll test, where the previous revision of the patch doesn't generate waterfall loop.

Dec 9 2022, 3:29 AM · Restricted Project, Restricted Project

Dec 7 2022

skc7 updated the diff for D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.

Rebase.

Dec 7 2022, 9:06 AM · Restricted Project, Restricted Project

Dec 5 2022

skc7 updated the diff for D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.

Thanks @nikic for feedback. Made changes to match Src ResultElementType with GEP SrcElementType by adding additional zero indices to Src gep.

Dec 5 2022, 7:06 PM · Restricted Project, Restricted Project

Dec 4 2022

skc7 added inline comments to D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.
Dec 4 2022, 11:51 PM · Restricted Project, Restricted Project
skc7 updated the diff for D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.

Update code as per @jmmartinez comments.

Dec 4 2022, 11:48 PM · Restricted Project, Restricted Project

Nov 30 2022

skc7 added inline comments to D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.
Nov 30 2022, 9:03 PM · Restricted Project, Restricted Project
skc7 updated the diff for D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.

Added comments to tests.

Nov 30 2022, 8:55 PM · Restricted Project, Restricted Project
skc7 updated the diff for D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.

Rebase.

Nov 30 2022, 7:45 AM · Restricted Project, Restricted Project
skc7 added a comment to D134423: [AMDGPU] Fix vgpr2sgpr copy analysis to check scalar operands of buffer instructions use scalar registers..

BTW, if %5 is divergent we have a bug in ISel. We now should not have any V2S copy with the divergent source.

Look at the MIR that @skc7 quoted. %5 is divergent - it's copied from a vgpr function argument.

The BUFFER_LOAD_DWORDX4_OFFEN is one of (as I remember correctly 5) the exceptional opcodes for which V2S copy is created even in case the copy source is divergent.
There is no bug in ISel. We have the value in VGPR because it is divergent and this is correct. The V2S copy is created in InstrEmitter just because the opcode requires SGPR.
We have yet several other such opcodes.

V_WRITELANE_B32, S_BUFFER_LOAD_DWORD_IMM, BUFFER_LOAD_FORMAT_X_OFFSET, BUFFER_LOAD_FORMAT_X_IDXEN, BUFFER_LOAD_FORMAT_X_OFFEN, BUFFER_LOAD_FORMAT_X_BOTHEN, IMAGE_SAMPLE_V1_V2
And this is really a TODO. For each of them, we should make a design and change legalizeOperand correspondingly.

Nov 30 2022, 3:19 AM · Restricted Project, Restricted Project

Nov 24 2022

skc7 updated the diff for D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.

Rebase.

Nov 24 2022, 8:09 AM · Restricted Project, Restricted Project

Nov 23 2022

skc7 requested review of D138637: [InstCombine] Combine opaque pointer single index GEP and with src GEP by matching the types.
Nov 23 2022, 9:38 PM · Restricted Project, Restricted Project
skc7 abandoned D130790: Fix failing tests for "[Clang][Attribute] Introduce maybe_undef attribute for function arguments which accepts undef values".
Nov 23 2022, 9:11 PM · Restricted Project, Restricted Project
skc7 abandoned D136544: [SLP] For vectorizing chains in basic block, decide order of PHI nodes based on their result use..
Nov 23 2022, 9:10 PM · Restricted Project, Restricted Project
skc7 abandoned D124496: [Clang][Attr] clanf-format update.
Nov 23 2022, 9:06 PM · Restricted Project, Restricted Project

Nov 17 2022

skc7 added inline comments to rG87a0b1bd233a: [InstSimplify] Remove zero-index opaque pointer GEP.
Nov 17 2022, 5:38 AM

Nov 16 2022

skc7 abandoned D124158: [Clang][Attr] Skip adding noundef attribute to arguments when function has convergent attribute.
Nov 16 2022, 9:47 PM · Restricted Project, Restricted Project, Restricted Project
skc7 abandoned D128700: [AMDGPU][Clang] Skip adding noundef attribute to AMDGPU HIP device functions.
Nov 16 2022, 9:46 PM · Restricted Project, Restricted Project, Restricted Project

Nov 8 2022

skc7 committed rG42bce72536ad: Reapply "[SLP] Extend reordering data of tree entry to support PHInodes". (authored by skc7).
Reapply "[SLP] Extend reordering data of tree entry to support PHInodes".
Nov 8 2022, 7:53 AM · Restricted Project, Restricted Project
skc7 closed D137537: [SLP] Extend reordering data of tree entry to support PHI nodes.
Nov 8 2022, 7:53 AM · Restricted Project, Restricted Project
skc7 set the repository for D137537: [SLP] Extend reordering data of tree entry to support PHI nodes to rG LLVM Github Monorepo.
Nov 8 2022, 5:27 AM · Restricted Project, Restricted Project
skc7 updated the diff for D137537: [SLP] Extend reordering data of tree entry to support PHI nodes.

Rebase. D137567, D137569 commits are merged upstream.

Nov 8 2022, 5:27 AM · Restricted Project, Restricted Project
skc7 committed rG46d53f45d89b: [SLP][NFC] Restructure getInsertIndex (authored by skc7).
[SLP][NFC] Restructure getInsertIndex
Nov 8 2022, 4:39 AM · Restricted Project, Restricted Project
skc7 closed D137567: [SLP][NFC] Restructure getInsertIndex.
Nov 8 2022, 4:39 AM · Restricted Project, Restricted Project
skc7 added a comment to D134423: [AMDGPU] Fix vgpr2sgpr copy analysis to check scalar operands of buffer instructions use scalar registers..

%8:sreg_32 = COPY %5:vgpr_32
%7:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %4:vgpr_32, killed %6:sgpr_128, %8:sreg_32, 0, 0, 0, 0, implicit $exec ::

I need more context. Is %5 uniform?

I think that I've got an idea behind this patch. Let's say %5 is uniform. Then we've got to try to promote all the %8 descendants to SALU if possible.
In some cases, it appears that such a copy has few or even no SALU descendants, and according to the common algorithm should be converted to VALU.
When the conversion is done, legalizeOperands creates the waterfall loop which is obviously much worse than inserting the v_readfirstlane_b32.
As far as I understand, @skc7 addresses this scenario and aims to avoid an unnecessary waterfall loop.
BTW, if %5 is divergent we have a bug in ISel. We now should not have any V2S copy with the divergent source.

Nov 8 2022, 12:46 AM · Restricted Project, Restricted Project
skc7 updated the diff for D137537: [SLP] Extend reordering data of tree entry to support PHI nodes.

Rebase. Moved fixes for scalable vectors to reviews: D137567, D137569

Nov 8 2022, 12:35 AM · Restricted Project, Restricted Project

Nov 7 2022

skc7 committed rG9d96feb19b57: [SLP][NFC] Restructure areTwoInsertFromSameBuildVector (authored by skc7).
[SLP][NFC] Restructure areTwoInsertFromSameBuildVector
Nov 7 2022, 8:04 PM · Restricted Project, Restricted Project
skc7 closed D137569: [SLP][NFC] Restructure areTwoInsertFromSameBuildVector.
Nov 7 2022, 8:04 PM · Restricted Project, Restricted Project
skc7 updated the diff for D137567: [SLP][NFC] Restructure getInsertIndex.

changes as per review suggestion.

Nov 7 2022, 7:35 PM · Restricted Project, Restricted Project