Page MenuHomePhabricator

ruiling (Ruiling, Song)
User

Projects

User does not belong to any projects.

User Details

User Since
Feb 21 2017, 5:58 PM (278 w, 4 d)

Recent Activity

Thu, Jun 23

ruiling committed rG49b8ca3f7c4f: AMDGPU: Don't crash on global_ctor/dtor declaration (authored by ruiling).
AMDGPU: Don't crash on global_ctor/dtor declaration
Thu, Jun 23, 6:06 AM · Restricted Project, Restricted Project
ruiling closed D128320: AMDGPU: Don't crash on global_ctor/dtor declaration.
Thu, Jun 23, 6:05 AM · Restricted Project, Restricted Project

Wed, Jun 22

ruiling updated the diff for D128320: AMDGPU: Don't crash on global_ctor/dtor declaration.

address review comments

Wed, Jun 22, 6:03 PM · Restricted Project, Restricted Project

Tue, Jun 21

ruiling requested review of D128320: AMDGPU: Don't crash on global_ctor/dtor declaration.
Tue, Jun 21, 9:55 PM · Restricted Project, Restricted Project
ruiling committed rG4dcb42fae572: AMDGPU: Skip unexpected CFG in SIOptimizeVGPRLiveRange (authored by ruiling).
AMDGPU: Skip unexpected CFG in SIOptimizeVGPRLiveRange
Tue, Jun 21, 9:51 PM · Restricted Project, Restricted Project
ruiling closed D128193: AMDGPU: Skip unexpected CFG in SIOptimizeVGPRLiveRange.
Tue, Jun 21, 9:50 PM · Restricted Project, Restricted Project
ruiling accepted D128185: [AMDGPU] Set GFX11 null export target based on export attributes.

LGTM

Tue, Jun 21, 1:00 AM · Restricted Project, Restricted Project

Mon, Jun 20

ruiling requested review of D128193: AMDGPU: Skip unexpected CFG in SIOptimizeVGPRLiveRange.
Mon, Jun 20, 5:56 AM · Restricted Project, Restricted Project

Wed, Jun 15

ruiling added a comment to D123231: [StructurizeCFG] Improve basic block ordering.

I had tried a different approach to avoid inserting excessive number of boolean values during the loop-exit-unify in D127831. I just did some testing of that change against the LLVM IR Brendon shared with me. It shows the change could help reducing the number of registers as well as compile time. But it is sad that I still hit the error: "unhandled SGPR spill to memory" from SGPRSpillBuilder in SIRegisterInfo.cpp. Can the limitation be fixed? I did some register pressure comparison, seems the way I proposed would use much less VGPR than (D123230 + D123231), but use more SGPR. I haven't looked further why there is such behavior difference. I think we need more investigation to know why. But looks like D127831 might help us generate better code because we can use one VGPR as the backup storage for spilling of 64/32 SGPRs. And the idea used there is much easy to follow.

Wed, Jun 15, 12:35 AM · Restricted Project, Restricted Project
ruiling requested review of D127831: BasicBlockUtils: Add a new way for CreateControlFlowHub().
Wed, Jun 15, 12:17 AM · Restricted Project, Restricted Project
ruiling requested review of D127830: NFC: restructure code for CreateControlFlowHub().
Wed, Jun 15, 12:15 AM · Restricted Project, Restricted Project
ruiling abandoned D127829: NFC: restructure code for CreateControlFlowHub().

wrong version.

Wed, Jun 15, 12:13 AM · Restricted Project, Restricted Project
ruiling requested review of D127829: NFC: restructure code for CreateControlFlowHub().
Wed, Jun 15, 12:11 AM · Restricted Project, Restricted Project

May 9 2022

ruiling added a comment to D124981: [AMDGPU] Enable WQM if demotes and softwqm are combined.

softwqm is used by our implementation of subgroup operations: if helper invocations are there, then subgroup operations must interact with them.

Helper invocations exist in one of two cases:

  1. There are "hardwqm" instructions (e.g. sampler).
  2. Invocations are explicitly demoted.
May 9 2022, 7:56 PM · Restricted Project, Restricted Project

May 8 2022

ruiling added a comment to D124981: [AMDGPU] Enable WQM if demotes and softwqm are combined.

The demote itself does not need to enable wqm. So I think soft_wqm should not depend on demote to enable wqm. If some graphics operation needs to run under wqm, it should directly use the non-soft version.

May 8 2022, 8:41 AM · Restricted Project, Restricted Project

Apr 13 2022

ruiling committed rG1e01f95057a7: LowerSwitch: Avoid inserting NewDefault block (authored by ruiling).
LowerSwitch: Avoid inserting NewDefault block
Apr 13 2022, 10:32 PM · Restricted Project, Restricted Project
ruiling committed rG7c87d75d74f3: test: Don't depend on behavior of switch lower in one test. NFC (authored by ruiling).
test: Don't depend on behavior of switch lower in one test. NFC
Apr 13 2022, 10:31 PM · Restricted Project, Restricted Project
ruiling closed D123607: LowerSwitch: Avoid inserting NewDefault block.
Apr 13 2022, 10:31 PM · Restricted Project, Restricted Project
ruiling closed D123606: test: Don't depend on behavior of switch lower in one test. NFC.
Apr 13 2022, 10:31 PM · Restricted Project, Restricted Project

Apr 12 2022

ruiling added a comment to D123231: [StructurizeCFG] Improve basic block ordering.

which increases compilation time and register pressure.

Have you looked at which part is responsible for the compilation time increase? Is is possible that we hit inefficiency in certain pass?
The "register pressure" here specifically means SGPR usage, right?

Apr 12 2022, 9:38 AM · Restricted Project, Restricted Project
ruiling added a comment to D123607: LowerSwitch: Avoid inserting NewDefault block.

the extra NewDefault is causing unstructured CFG

Just curious: what does "unstructured CFG" mean here? Is there an exact definition? Thanks.

Apr 12 2022, 7:35 AM · Restricted Project, Restricted Project
ruiling requested review of D123607: LowerSwitch: Avoid inserting NewDefault block.
Apr 12 2022, 6:44 AM · Restricted Project, Restricted Project
ruiling requested review of D123606: test: Don't depend on behavior of switch lower in one test. NFC.
Apr 12 2022, 6:43 AM · Restricted Project, Restricted Project

Apr 11 2022

ruiling accepted D123480: [AMDGPU] Graceful abort for waterfalls in SIOptimizeVGPRLiveRange.

Sounds good to me, Thanks for the fix!

Apr 11 2022, 7:42 PM · Restricted Project, Restricted Project

Mar 27 2022

ruiling accepted D122200: [AMDGPU] Split waterfall loop exec manipulation.

LGTM

Mar 27 2022, 10:45 PM · Restricted Project, Restricted Project

Mar 23 2022

ruiling added inline comments to D122200: [AMDGPU] Split waterfall loop exec manipulation.
Mar 23 2022, 7:20 PM · Restricted Project, Restricted Project
ruiling added a comment to D122200: [AMDGPU] Split waterfall loop exec manipulation.

I agree this is a good idea and the change looks good, just only a few minor comments. btw, Is this the only case that we modify EXEC in middle of a block?

Mar 23 2022, 2:48 AM · Restricted Project, Restricted Project

Mar 17 2022

ruiling accepted D121277: [MachineSink] Check block prologue interference.

To the best of my knowledge, this should work. But please wait one or two days in case others have more comment. Please also update commit message before push.

Mar 17 2022, 5:22 AM · Restricted Project, Restricted Project
ruiling added a comment to D121277: [MachineSink] Check block prologue interference.

Thanks for the change, mostly looks good to me. with only a few minor comments.

Mar 17 2022, 12:59 AM · Restricted Project, Restricted Project

Mar 15 2022

ruiling added a comment to D121277: [MachineSink] Check block prologue interference.

Try this. I don't know what would be the most likely pattern in real cases, this is just to show that sinking $sgpr0_sgpr1 definition instruction into successor block would break the register dependency of type (a).

---
name:            _amdgpu_ps_main
alignment:       1
tracksRegLiveness: true
registers:       []
liveins:
  - { reg: '$sgpr4', virtual-reg: '' }
body:             |
  bb.0:
    successors: %bb.1(0x80000000)
    liveins: $sgpr4, $sgpr5
Mar 15 2022, 9:03 PM · Restricted Project, Restricted Project
ruiling added reviewers for D121277: [MachineSink] Check block prologue interference: arsenm, nhaehnle.

I agree this issue should be fixed here. Machine sinking should check for register dependency between the sunk instruction and the prologue instruction in the successor block.
But I think there are two kinds of register dependency need to be checked:

a.) the definition of a register which would be used in successor prologue instruction.
b.) the instruction which has a source physical register being overwritten by successor prologue instruction.

I think you are fixing the second kind of dependency currently. Maybe check the first kind of dependency in the same change?
The first kind of def-use dependency should also apply to the pre-RA sinking.

Mar 15 2022, 1:58 AM · Restricted Project, Restricted Project

Mar 14 2022

ruiling committed rG98dd390573dc: AMDGPU: Use removeAllRegUnitsForPhysReg() (authored by ruiling).
AMDGPU: Use removeAllRegUnitsForPhysReg()
Mar 14 2022, 7:29 PM · Restricted Project
ruiling closed D117014: AMDGPU: Use removeAllRegUnitsForPhysReg().
Mar 14 2022, 7:28 PM · Restricted Project, Restricted Project

Mar 13 2022

Herald added a project to D117014: AMDGPU: Use removeAllRegUnitsForPhysReg(): Restricted Project.

I agree we need to figure out why register coalescer would introduce the liveness for sub-registers of reserved physical register, but I currently don't have enough time to work on that. I think that can be done separately without blocking this change. I would like to apply this change to fix the issue here if no objection. I noticed we already has a bug-report against this issue https://github.com/llvm/llvm-project/issues/54202.

Mar 13 2022, 9:39 PM · Restricted Project, Restricted Project

Mar 8 2022

ruiling added inline comments to D121268: [AMDGPU] Control flow pseudos are not part of block prologue.
Mar 8 2022, 7:37 PM · Restricted Project, Restricted Project

Feb 14 2022

ruiling accepted D119399: [MachineSink] Use SkipPHIsAndLabels for sink insertion points.

LGTM

Feb 14 2022, 5:41 AM · Restricted Project

Feb 10 2022

ruiling added inline comments to D119399: [MachineSink] Use SkipPHIsAndLabels for sink insertion points.
Feb 10 2022, 7:36 PM · Restricted Project

Feb 9 2022

ruiling accepted D117796: AMDGPU: Fix LiveVariables error after lowering SI_END_CF.

LGTM

Feb 9 2022, 8:07 PM · Restricted Project, Restricted Project

Feb 7 2022

ruiling added a comment to D117796: AMDGPU: Fix LiveVariables error after lowering SI_END_CF.

The reason we are splitting the block is so we can place the exec modification before the copies for lowered phis. We cannot split the block as needed while the phis still exist since phis need to be the first instructions in the block

I don't understand, we are moving out instructions after SI_END_CF into a new block, all the PHIs and SI_END_CF will still be in the old block. Why are we breaking the phis?
To make it clear, I am not strongly asking for this now. I am fine with fixing the LiveVariable update issue.

Feb 7 2022, 7:25 PM · Restricted Project, Restricted Project

Feb 5 2022

ruiling committed rG2f4d44bcd4a1: AMDGPU: add test to show wwm register overwrite issue (authored by ruiling).
AMDGPU: add test to show wwm register overwrite issue
Feb 5 2022, 8:39 PM
ruiling committed rG0719c43735b2: AMDGPU: Don't clobber source register for V_SET_INACTIVE_* (authored by ruiling).
AMDGPU: Don't clobber source register for V_SET_INACTIVE_*
Feb 5 2022, 8:39 PM
ruiling closed D117482: AMDGPU: Don't clobber source register for V_SET_INACTIVE_*.
Feb 5 2022, 8:38 PM · Restricted Project
ruiling closed D117527: AMDGPU: add test to show wwm register overwrite issue.
Feb 5 2022, 8:38 PM · Restricted Project

Jan 30 2022

ruiling added a comment to D118185: [AMDGPU] Mark v_cmp* convergent.

Maybe another solution is to try to deprecate and remove amdgcn.icmp in favour of amdcgn.ballot, and try to lower ballot to a single machine instruction that is marked "convergent".

Jan 30 2022, 6:31 PM · Restricted Project
ruiling added inline comments to D117796: AMDGPU: Fix LiveVariables error after lowering SI_END_CF.
Jan 30 2022, 8:29 AM · Restricted Project, Restricted Project

Jan 26 2022

ruiling added a comment to D117796: AMDGPU: Fix LiveVariables error after lowering SI_END_CF.

I would hope we can move the block split logic into a separate pass that can be scheduled before we constructing liveness information, like during addPreRegAlloc(). It is expensive to update either LiveVariables or LiveIntervals. Sounds good?

The whole reason this pass is here is because it needs to be done after phi elimination, so you can't really move it anywhere else. LiveVariables we should also just be pushing to eliminate entirely

I think block split itself could be done before phi elimination. If you detect the source operand of SI_END_CF defined in the same block, you can split the block. That works just like what we are doing now. Did I miss something? Even you deprecate LiveVariables, you still need to update LiveIntervals, it still need to searching against all virtual registers to see whose LiveInterval need updated.

Jan 26 2022, 7:17 PM · Restricted Project, Restricted Project
ruiling added a comment to D117796: AMDGPU: Fix LiveVariables error after lowering SI_END_CF.

I would hope we can move the block split logic into a separate pass that can be scheduled before we constructing liveness information, like during addPreRegAlloc(). It is expensive to update either LiveVariables or LiveIntervals. Sounds good?

The whole reason this pass is here is because it needs to be done after phi elimination, so you can't really move it anywhere else. LiveVariables we should also just be pushing to eliminate entirely

Jan 26 2022, 6:55 PM · Restricted Project, Restricted Project
ruiling added a comment to D117796: AMDGPU: Fix LiveVariables error after lowering SI_END_CF.

I would hope we can move the block split logic into a separate pass that can be scheduled before we constructing liveness information, like during addPreRegAlloc(). It is expensive to update either LiveVariables or LiveIntervals. Sounds good?

Jan 26 2022, 6:43 PM · Restricted Project, Restricted Project
ruiling added a comment to D118250: AMDGPU: Mark control flow intrinsics non-duplicable.

I don't think this is a good idea. We don't actually need a structured CFG at this point, and tail duplicating isn't exactly unstructuring anyway. This is not an alternative to fixing the LiveVariables update problem, it's just the testcase that broke happened to have appeared due to tail duplication

Jan 26 2022, 7:32 AM · Restricted Project, Restricted Project
ruiling added a comment to D118250: AMDGPU: Mark control flow intrinsics non-duplicable.

Is this an alternative to D117796?

Jan 26 2022, 7:30 AM · Restricted Project, Restricted Project
ruiling requested review of D118250: AMDGPU: Mark control flow intrinsics non-duplicable.
Jan 26 2022, 7:19 AM · Restricted Project, Restricted Project
ruiling added a comment to D118185: [AMDGPU] Mark v_cmp* convergent.

I am not sure this is a good idea. Actually an ordinary V_CMP is still sinkable. It is only the V_CMP lowered from llvm.amdgcn.icmp that cannot be sinked.

Right, an ordinary V_CMP is still sinkable because of the way the result is used, there is an implicit guarantee that bits in the result corresponding to inactive lanes (at the point of use) will be ignored. So marking *all* V_CMPs convergent is overkill, although it does not seem to have any bad effects on the test suite. On the other hand, duplicating all the V_CMP instructions (so we have separate convergent and non-convergent versions of them) doesn't seem like a good idea, because there are lots of them.

If duplicating V_CMP for convergent and non-convergent version would make code over-complex, I would go for this simple solution. We can re-visit this issue when we really hit some performance issue here. Let's see if others have different opinion.

Jan 26 2022, 4:25 AM · Restricted Project
ruiling added a comment to D117482: AMDGPU: Don't clobber source register for V_SET_INACTIVE_*.

If I understand correctly, the root issue is the Register Coalescer not being aware of the wave-level CFG?

Actually it is because the WWM register have a weird liveness:( I don't have a good idea to model it. Even with wave-level CFG, it is still hard, right?

Jan 26 2022, 12:25 AM · Restricted Project

Jan 25 2022

ruiling added a comment to D118185: [AMDGPU] Mark v_cmp* convergent.

I am not sure this is a good idea. Actually an ordinary V_CMP is still sinkable. It is only the V_CMP lowered from llvm.amdgcn.icmp that cannot be sinked.

Jan 25 2022, 11:26 PM · Restricted Project

Jan 24 2022

ruiling added inline comments to D117909: [AMDGPU] Remove cndmask from readsExecAsData.
Jan 24 2022, 7:14 AM · Restricted Project
ruiling added inline comments to D117796: AMDGPU: Fix LiveVariables error after lowering SI_END_CF.
Jan 24 2022, 6:03 AM · Restricted Project, Restricted Project
ruiling updated subscribers of D117909: [AMDGPU] Remove cndmask from readsExecAsData.
Jan 24 2022, 12:16 AM · Restricted Project

Jan 23 2022

ruiling added a comment to D117482: AMDGPU: Don't clobber source register for V_SET_INACTIVE_*.

Ping. This is used to fix issue from real app. Can anyone take a close look?

Jan 23 2022, 11:25 PM · Restricted Project
ruiling added inline comments to D117014: AMDGPU: Use removeAllRegUnitsForPhysReg().
Jan 23 2022, 11:14 PM · Restricted Project, Restricted Project

Jan 17 2022

ruiling updated the diff for D117482: AMDGPU: Don't clobber source register for V_SET_INACTIVE_*.

Move tests into separate change D117527

Jan 17 2022, 5:05 PM · Restricted Project
ruiling requested review of D117527: AMDGPU: add test to show wwm register overwrite issue.
Jan 17 2022, 5:03 PM · Restricted Project
ruiling requested review of D117482: AMDGPU: Don't clobber source register for V_SET_INACTIVE_*.
Jan 17 2022, 6:43 AM · Restricted Project

Jan 11 2022

ruiling updated the diff for D117014: AMDGPU: Use removeAllRegUnitsForPhysReg().

also for SCC.

Jan 11 2022, 6:15 AM · Restricted Project, Restricted Project
ruiling requested review of D117014: AMDGPU: Use removeAllRegUnitsForPhysReg().
Jan 11 2022, 5:25 AM · Restricted Project, Restricted Project

Jan 10 2022

ruiling added inline comments to D115551: [AMDGPU] Do not reserve any VGPR for SGPR spills.
Jan 10 2022, 5:33 PM · Restricted Project
ruiling accepted D116714: AMDGPU: Fix LiveVariables error after optimizing VGPR ranges.

LGTM with one minor comment.

Jan 10 2022, 6:35 AM · Restricted Project

Jan 7 2022

ruiling added inline comments to D116714: AMDGPU: Fix LiveVariables error after optimizing VGPR ranges.
Jan 7 2022, 5:22 PM · Restricted Project
ruiling added inline comments to D116714: AMDGPU: Fix LiveVariables error after optimizing VGPR ranges.
Jan 7 2022, 6:41 AM · Restricted Project

Nov 23 2021

ruiling accepted D112696: CycleInfo: Introduce cycles as a generalization of loops.

LGTM. please wait a few days before committing in case others have further comments.

Nov 23 2021, 7:26 PM · Restricted Project

Nov 19 2021

ruiling added inline comments to D112696: CycleInfo: Introduce cycles as a generalization of loops.
Nov 19 2021, 7:34 PM · Restricted Project

Nov 17 2021

ruiling added a comment to D112696: CycleInfo: Introduce cycles as a generalization of loops.

Thanks for this revision, the refactor make lots of sense to me. I have added some further comments. Other parts look pretty good to me.

Nov 17 2021, 10:51 PM · Restricted Project

Nov 7 2021

ruiling abandoned D99051: [InstCombine] Stop folding inttoptr+bitcast if multiple uses.
Nov 7 2021, 10:45 PM · Restricted Project
ruiling added inline comments to D112696: CycleInfo: Introduce cycles as a generalization of loops.
Nov 7 2021, 10:11 PM · Restricted Project
ruiling added inline comments to D112696: CycleInfo: Introduce cycles as a generalization of loops.
Nov 7 2021, 5:54 AM · Restricted Project

Nov 5 2021

ruiling added inline comments to D112696: CycleInfo: Introduce cycles as a generalization of loops.
Nov 5 2021, 7:33 PM · Restricted Project

Nov 1 2021

ruiling added a comment to D112731: [AMDGPU] Really preserve LiveVariables in SILowerControlFlow.

sounds good to me.

Nov 1 2021, 6:32 AM · Restricted Project
ruiling added inline comments to D112731: [AMDGPU] Really preserve LiveVariables in SILowerControlFlow.
Nov 1 2021, 5:31 AM · Restricted Project
ruiling added inline comments to D112731: [AMDGPU] Really preserve LiveVariables in SILowerControlFlow.
Nov 1 2021, 5:00 AM · Restricted Project
ruiling accepted D112731: [AMDGPU] Really preserve LiveVariables in SILowerControlFlow.

To the best of my knowledge, the change looks good. But wait one or two days in case others may have more comments. And one corner case which I think also needs to be updated.

Nov 1 2021, 12:57 AM · Restricted Project

Oct 29 2021

ruiling added inline comments to D112731: [AMDGPU] Really preserve LiveVariables in SILowerControlFlow.
Oct 29 2021, 4:28 AM · Restricted Project

Oct 28 2021

ruiling added inline comments to D112731: [AMDGPU] Really preserve LiveVariables in SILowerControlFlow.
Oct 28 2021, 5:44 PM · Restricted Project
ruiling added inline comments to D112731: [AMDGPU] Really preserve LiveVariables in SILowerControlFlow.
Oct 28 2021, 9:42 AM · Restricted Project
ruiling added inline comments to D112731: [AMDGPU] Really preserve LiveVariables in SILowerControlFlow.
Oct 28 2021, 9:34 AM · Restricted Project

Sep 29 2021

ruiling abandoned D109754: AMDGPU: Use -1/0 when copying from SCC to SGPR.
Sep 29 2021, 7:20 PM · Restricted Project
ruiling committed rG52785989e95d: AMDGPU: Broadcast scalar boolean to vector boolean explicitly (authored by ruiling).
AMDGPU: Broadcast scalar boolean to vector boolean explicitly
Sep 29 2021, 7:19 PM
ruiling closed D109889: AMDGPU: Broadcast scalar boolean to vector boolean explicitly.
Sep 29 2021, 7:19 PM · Restricted Project
ruiling added a comment to D109889: AMDGPU: Broadcast scalar boolean to vector boolean explicitly.

I will submit this version tomorrow.

Sep 29 2021, 7:30 AM · Restricted Project
ruiling updated the diff for D109889: AMDGPU: Broadcast scalar boolean to vector boolean explicitly.

remember wave-size and reuse

Sep 29 2021, 7:28 AM · Restricted Project

Sep 27 2021

ruiling retitled D109889: AMDGPU: Broadcast scalar boolean to vector boolean explicitly from AMDGPU: Lower one copy from SCC early for SelectionDAG to AMDGPU: Broadcast scalar boolean to vector boolean explicitly.
Sep 27 2021, 7:26 PM · Restricted Project

Sep 21 2021

ruiling added a comment to D109889: AMDGPU: Broadcast scalar boolean to vector boolean explicitly.

Using -1 is also misleading, since a true boolean value is also anded with exec

It depends on what do you think of the problem.
We can formalize the boolean values stored in SGPR like this: for uniform booleans, the bits corresponding to active lanes holding the effective value, other bits are undefined. for divergent booleans, the active lanes holding the effective value, other bits are zero.

Sep 21 2021, 7:50 PM · Restricted Project
ruiling added a comment to D109900: [AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC.

The patch LGTM. Any different idea from others?

Sep 21 2021, 12:14 AM · Restricted Project

Sep 20 2021

ruiling added a comment to D109889: AMDGPU: Broadcast scalar boolean to vector boolean explicitly.

Hi @arsenm, what do you think of this idea?

Sep 20 2021, 11:46 PM · Restricted Project

Sep 17 2021

ruiling added a comment to D109900: [AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC.

IMO, it is more useful to use an end-to-end test ( from LLVM IR to assembly). We do lots of work scattered in different places to deal with boolean values. Things may change in the future, and we may move this logic to other passes.
Btw, I don't know why you were creating a file with the mode 755. I don't know whether it matters in a patch.

Sep 17 2021, 7:01 PM · Restricted Project

Sep 16 2021

ruiling added a reviewer for D109900: [AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC: arsenm.

When we pass the scalar i1 values to uniform VALU instructions
we use the fllowing representation: false - 0, true - 0xffffffffffffffff.
VALU instructions only process the olanes that are active, that is controlled by the EXEC mask.
We need to filter out odd bits when copy the computation result back to SCC.

Sep 16 2021, 3:31 PM · Restricted Project
ruiling added a comment to D109754: AMDGPU: Use -1/0 when copying from SCC to SGPR.

I think the cause of the issue is that in GlobalISel we are defining the COPY from SCC to SGPR as select 0/1 based on SCC, But in SelectionDAG, the COPY from SCC to SGPR means select 0/-1 based on SCC. And both paths will have some of such copies to be lowered in copyPhysReg(). I have lowered the copy earlier in D109889.

Sep 16 2021, 8:14 AM · Restricted Project
ruiling requested review of D109889: AMDGPU: Broadcast scalar boolean to vector boolean explicitly.
Sep 16 2021, 8:13 AM · Restricted Project

Sep 14 2021

ruiling added a comment to D109754: AMDGPU: Use -1/0 when copying from SCC to SGPR.

The change makes sense to me in general.

When I tried doing the same long time ago, Matt rightly pointed out (https://reviews.llvm.org/D81925#inline-753225) that sign extending does not correspond to the booleans being defined as zero extended. So, wouldn't your change also require an accompanying change of boolean contents to ZeroOrNegativeOneBooleanContent instead of ZeroOrOneBooleanContent in SIISelLowering.cpp?

No, because this is producing a mask and not really an extended value

Well in this context it's ambiguous since we don't know what the destination register is used for. I think this is a problem earlier in the flow

Sep 14 2021, 7:29 AM · Restricted Project
ruiling requested review of D109754: AMDGPU: Use -1/0 when copying from SCC to SGPR.
Sep 14 2021, 4:56 AM · Restricted Project

Aug 12 2021

ruiling committed rGe1beebbac5da: SplitKit: Don't further split subrange mask in buildCopy (authored by ruiling).
SplitKit: Don't further split subrange mask in buildCopy
Aug 12 2021, 4:37 PM
ruiling closed D107829: SplitKit: Don't further split subrange mask in buildCopy.
Aug 12 2021, 4:37 PM · Restricted Project