Page MenuHomePhabricator

cfang (Changpeng Fang)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 30 2015, 11:41 AM (158 w, 4 d)

Recent Activity

Thu, Dec 13

cfang added inline comments to D55568: AMDGPU: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing..
Thu, Dec 13, 8:29 AM

Tue, Dec 11

cfang created D55568: AMDGPU: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing..
Tue, Dec 11, 12:25 PM

Mon, Dec 10

cfang added a comment to D55241: AMDGPU: Should always start from the first register in VGPR indexing..

No, it's not committed. Variable + constant is a common case in general.

It would probably be better to do this fold in the DAG for now though

Mon, Dec 10, 10:18 AM

Wed, Dec 5

cfang updated the diff for D55241: AMDGPU: Should always start from the first register in VGPR indexing..

Fix typos.

Wed, Dec 5, 2:36 PM
cfang added a comment to D55241: AMDGPU: Should always start from the first register in VGPR indexing..

D30466 is the primitive computeKnownBits

Wed, Dec 5, 11:23 AM
cfang added inline comments to D55241: AMDGPU: Should always start from the first register in VGPR indexing..
Wed, Dec 5, 10:28 AM
cfang added a comment to D55241: AMDGPU: Should always start from the first register in VGPR indexing..

We should try to use some known bits information to keep this. I have a patch to add a machine version, but there might be a better way

Would you please explain how would your knownbit approach resolve the negative index issue while keep the optimization for gfx9+?
Or just post your patch. Thanks.

If you know the base index isn't negative, you don't need to disable this

Wed, Dec 5, 10:28 AM

Mon, Dec 3

cfang added a comment to D55241: AMDGPU: Should always start from the first register in VGPR indexing..

We should try to use some known bits information to keep this. I have a patch to add a machine version, but there might be a better way

Mon, Dec 3, 4:00 PM
cfang created D55241: AMDGPU: Should always start from the first register in VGPR indexing..
Mon, Dec 3, 3:47 PM

Oct 15 2018

cfang updated the diff for D52946: AMDGPU: Add support pattern for SUB of one bit.

Add test for non-uniform case.

Oct 15 2018, 3:10 PM

Oct 5 2018

cfang created D52946: AMDGPU: Add support pattern for SUB of one bit.
Oct 5 2018, 1:49 PM

Sep 25 2018

cfang created D52518: AMDGPU: Add Selection patterns to support add of one bit..
Sep 25 2018, 1:26 PM

May 17 2018

cfang added a reviewer for D46912: StructurizeCFG: Adjust the loop depth for a subregion to order the nodes correctly: dberlin.

Thanks Justin for your comment. I added Daniel as a reviewer.

May 17 2018, 9:25 AM
cfang added a reviewer for D46912: StructurizeCFG: Adjust the loop depth for a subregion to order the nodes correctly: jlebar.
May 17 2018, 8:44 AM

May 16 2018

cfang updated the diff for D46912: StructurizeCFG: Adjust the loop depth for a subregion to order the nodes correctly.

Updated the test based on the comments

May 16 2018, 1:33 PM
cfang added a comment to D46912: StructurizeCFG: Adjust the loop depth for a subregion to order the nodes correctly.

Right. Reverse Post Order (RPO) should be the fundamental order. But apparently there are some cases that pure RPO does not work.
That's the reason loop depth was introduced to guard the ordering of the nodes.

May 16 2018, 1:32 PM
cfang added a comment to D46340: AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops..

We need this fix to unblock some important tasks. So if there is no additional comments, we need the permission
to integrate the patch. Thanks.

May 16 2018, 10:04 AM

May 15 2018

cfang created D46912: StructurizeCFG: Adjust the loop depth for a subregion to order the nodes correctly.
May 15 2018, 4:06 PM

May 10 2018

cfang added a comment to D46085: AMDGPU/SI: Don't promote alloca to vector for atomic load/store.

ping

May 10 2018, 8:19 AM

May 9 2018

cfang added a comment to D46340: AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops..

Any additional comments/suggestions? Thanks!

May 9 2018, 2:54 PM
cfang updated the diff for D45228: AMDGPU/SI: Handle BitCast of GEP in promoting alloca to vector.

Update based on Matt's comments.

May 9 2018, 2:52 PM
cfang added inline comments to D45228: AMDGPU/SI: Handle BitCast of GEP in promoting alloca to vector.
May 9 2018, 2:50 PM

May 7 2018

cfang accepted D46532: AMDGPU: Stop special casing constant indexes of extract_vector_elt.
May 7 2018, 2:18 PM
cfang updated the diff for D46340: AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops..

Correct indentation error and typo.

May 7 2018, 9:31 AM
cfang added a comment to D46340: AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops..
May 7 2018, 9:28 AM

May 4 2018

cfang created D46438: AMDGPU: Use eraseFromParent to delete am instruction when it is no longer needed..
May 4 2018, 9:34 AM

May 3 2018

cfang added a comment to D46340: AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops..

So this does not cover all possible cases of "irreducible" infinite loops, e.g. consider the case with basic blocks A and B where both end with a conditional branch that can go to either A or B. I.e., there are no unconditional branches in the loop.

A more robust approach would be to also transform conditional branches like this:

br i1 %cc, label %A, label %B

becomes

  br i1 true, label %DummyReturnBB, label %local_dummy
local_dummy:
  br i1 %cc, label %A, label %b
May 3 2018, 4:45 PM
cfang updated the diff for D46340: AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops..

Handle the case that the infinite loop is controlled by a conditional branch. In such case, the two edges of the branch
are both backedges.

May 3 2018, 4:40 PM

May 2 2018

cfang updated the diff for D45993: AMDGPU/SI: Don't promote alloca to vector for AddrSpaceCast instruction..

Update the test:

May 2 2018, 2:40 PM
cfang updated the diff for D46085: AMDGPU/SI: Don't promote alloca to vector for atomic load/store.

Update the test:

  1. Remove the -amdgiz from the triple since it is no longer necessary;
  2. Add "-data-layout=A5" to explicitly specify the data layout for address space 5 for alloca
    • and to remove the "target:" line for the same data layout purpose.
May 2 2018, 2:23 PM

May 1 2018

cfang created D46340: AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops..
May 1 2018, 4:11 PM

Apr 27 2018

cfang accepted D46154: [AMDGPU][Waitcnt] Update a few lit tests to use the default waitcnt pass.

LGTM

Apr 27 2018, 9:51 AM · Restricted Project

Apr 26 2018

cfang added inline comments to D45228: AMDGPU/SI: Handle BitCast of GEP in promoting alloca to vector.
Apr 26 2018, 2:10 PM
cfang added a comment to D46085: AMDGPU/SI: Don't promote alloca to vector for atomic load/store.

What do you think of the test?
What should we do if we don't add the following line?
target datalayout = "A5"

Apr 26 2018, 1:48 PM
cfang added a comment to D45993: AMDGPU/SI: Don't promote alloca to vector for AddrSpaceCast instruction..

What do you think of the test? Thanks;

Apr 26 2018, 1:46 PM

Apr 25 2018

cfang accepted D46089: AMDGPU: Extend extract_vector_elt fneg combine to fabs.

LGTM,

Apr 25 2018, 3:49 PM
cfang added inline comments to D46085: AMDGPU/SI: Don't promote alloca to vector for atomic load/store.
Apr 25 2018, 3:48 PM
cfang updated the diff for D46085: AMDGPU/SI: Don't promote alloca to vector for atomic load/store.

Combine !isAtomic and !isVolatile checks as isSimple

Apr 25 2018, 3:43 PM
cfang created D46085: AMDGPU/SI: Don't promote alloca to vector for atomic load/store.
Apr 25 2018, 2:33 PM
cfang updated the diff for D45993: AMDGPU/SI: Don't promote alloca to vector for AddrSpaceCast instruction..

Fix addrspacecast in the test since generic address space is 0 (default) now. Thanks, Sam!

Apr 25 2018, 10:08 AM

Apr 24 2018

cfang updated the diff for D45993: AMDGPU/SI: Don't promote alloca to vector for AddrSpaceCast instruction..

Add a test.

Apr 24 2018, 4:16 PM

Apr 23 2018

cfang created D45993: AMDGPU/SI: Don't promote alloca to vector for AddrSpaceCast instruction..
Apr 23 2018, 3:02 PM
cfang accepted D45973: [AMDGPU][Waitcnt] NFC. Cleanup some code.
Apr 23 2018, 10:53 AM · Restricted Project
cfang added reviewers for D45228: AMDGPU/SI: Handle BitCast of GEP in promoting alloca to vector: msearles, kzhuravl.

Add reviewers and ping.

Apr 23 2018, 10:50 AM

Apr 12 2018

cfang added a comment to D45228: AMDGPU/SI: Handle BitCast of GEP in promoting alloca to vector.

ping

Apr 12 2018, 4:10 PM

Apr 3 2018

cfang created D45228: AMDGPU/SI: Handle BitCast of GEP in promoting alloca to vector.
Apr 3 2018, 1:50 PM

Feb 16 2018

cfang updated the diff for D33559: AMDGPU/SI: Extend promoting alloca to vector to arrays of up to 16 elements.

Update LIT tests before landing since the patch was developed a long time back.

Feb 16 2018, 11:04 AM

Feb 15 2018

cfang updated the diff for D43297: AMDGPU/SI: Turn off GPR Indexing Mode immediately after the interested instruction..

full context diff

Feb 15 2018, 4:06 PM
cfang added a comment to D43297: AMDGPU/SI: Turn off GPR Indexing Mode immediately after the interested instruction..

Ping

Feb 15 2018, 4:02 PM
cfang added a reviewer for D43297: AMDGPU/SI: Turn off GPR Indexing Mode immediately after the interested instruction.: rampitec.
Feb 15 2018, 2:58 PM

Feb 14 2018

cfang created D43297: AMDGPU/SI: Turn off GPR Indexing Mode immediately after the interested instruction..
Feb 14 2018, 9:39 AM

Feb 8 2018

cfang accepted D42760: AMDGPU: Remove tied operand from si_else.
Feb 8 2018, 4:10 PM

Feb 7 2018

cfang accepted D43049: AMDGPU: Don't crash when trying to fold implicit operands.

LGTM

Feb 7 2018, 4:03 PM

Jan 31 2018

cfang added a comment to D42548: AMDGPU/SI: Adjust the encoding family for D16 buffer instructions when the target has UnpackedD16VMem feature..

Needs test

Jan 31 2018, 3:14 PM
cfang updated the diff for D42548: AMDGPU/SI: Adjust the encoding family for D16 buffer instructions when the target has UnpackedD16VMem feature..

Update tests to expose the LLVM-ERROR withput the patch.

Jan 31 2018, 3:14 PM

Jan 29 2018

cfang added a comment to D42548: AMDGPU/SI: Adjust the encoding family for D16 buffer instructions when the target has UnpackedD16VMem feature..

ping! Thanks.

Jan 29 2018, 1:22 PM
cfang added a comment to D42596: AMDGPU/SI: Add decoding in the GFX80_UNPACKED decoding namespace..
In D42596#990403, @dp wrote:

It would be nice to have a few decoding tests as there is currently zero _d16 coverage for disassembler.

BTW, there is no difference in data size for affected MIMG instructions. Currently data size in dwords = count_population(dmask) + tfe. How does d16 affect this number?

I was unable to find anything helpful in docs and SP3 output is not affected by d16 for gfx9. Are data in VGPRs really packed for MIMG? :-)

For MIMG, to determine the data size for d16, we need to know whether the target has the feature UnpackedD16VMem (gfx8.0),

  1. if that feature is set, the data size is the same as without D16 bit set;
  2. if that feature is not set, then the data size is "half" of the size because we can packed two f16 into one register.
Jan 29 2018, 12:07 PM
cfang updated the diff for D42596: AMDGPU/SI: Add decoding in the GFX80_UNPACKED decoding namespace..

Add Disassembler tests based on Reviewers' suggestion. Thanks.

Jan 29 2018, 11:51 AM

Jan 26 2018

cfang added a comment to D42596: AMDGPU/SI: Add decoding in the GFX80_UNPACKED decoding namespace..

Can this be tested?

Jan 26 2018, 1:46 PM
cfang created D42596: AMDGPU/SI: Add decoding in the GFX80_UNPACKED decoding namespace..
Jan 26 2018, 12:16 PM

Jan 25 2018

cfang created D42548: AMDGPU/SI: Adjust the encoding family for D16 buffer instructions when the target has UnpackedD16VMem feature..
Jan 25 2018, 9:55 AM

Jan 18 2018

cfang closed D39912: AMDGPU/SI: Implement d16 support for image intrinsics.

Patch committed to trunk:
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322903

Jan 18 2018, 3:03 PM
cfang added a comment to D39912: AMDGPU/SI: Implement d16 support for image intrinsics.

Patched updated! Request for reviewer's check. Thanks.
<still I could not receive message of the diff update>

Jan 18 2018, 8:35 AM
cfang updated the diff for D39912: AMDGPU/SI: Implement d16 support for image intrinsics.

Update based on Matt's suggestion: Factor out the common defs.

Jan 18 2018, 8:28 AM

Jan 17 2018

cfang added inline comments to D39912: AMDGPU/SI: Implement d16 support for image intrinsics.
Jan 17 2018, 10:04 AM

Jan 16 2018

cfang added a comment to D39912: AMDGPU/SI: Implement d16 support for image intrinsics.

Don't know why I didn't received a message after I updated the patch. So ping here with the updating message:

Jan 16 2018, 8:17 AM

Jan 15 2018

cfang updated the diff for D39912: AMDGPU/SI: Implement d16 support for image intrinsics.
  1. sync with the buffer d16 support patch to reuse some of the functionalities;
  2. Remove unnecessary functions that print "d16" in the AsmPrinter;
  3. Update based on Matt's other comments.
Jan 15 2018, 3:18 PM

Jan 11 2018

cfang added inline comments to D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.
Jan 11 2018, 11:25 AM
cfang updated the diff for D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.

Redesign the function to handle Vdata to store so that is will be called only when vdata is of type d16.

Jan 11 2018, 11:24 AM

Jan 10 2018

cfang updated the diff for D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.

Rename a function and add a new helper function based on Matt's comments.

Jan 10 2018, 4:10 PM
cfang added inline comments to D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.
Jan 10 2018, 3:24 PM
cfang updated the diff for D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.

Get the output "Chain" of the intrinsic correctly. Thanks.

Jan 10 2018, 11:23 AM

Jan 9 2018

cfang updated the diff for D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.

Update based on Matt's latest comments.

Jan 9 2018, 11:43 AM
cfang added inline comments to D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.
Jan 9 2018, 9:45 AM
cfang accepted D41866: AMDGPU: Error in SIAnnotateControlFlow instead of assert.

LGTM

Jan 9 2018, 9:14 AM
cfang added inline comments to D41866: AMDGPU: Error in SIAnnotateControlFlow instead of assert.
Jan 9 2018, 8:53 AM

Jan 5 2018

cfang added inline comments to D39912: AMDGPU/SI: Implement d16 support for image intrinsics.
Jan 5 2018, 8:56 AM
cfang added inline comments to D39912: AMDGPU/SI: Implement d16 support for image intrinsics.
Jan 5 2018, 8:48 AM
cfang added inline comments to D39912: AMDGPU/SI: Implement d16 support for image intrinsics.
Jan 5 2018, 8:44 AM

Jan 4 2018

cfang updated the diff for D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.

Update based on Reviewer's comments:

Jan 4 2018, 3:19 PM

Dec 20 2017

cfang added a comment to D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.

ping

Dec 20 2017, 10:49 AM
cfang added a comment to D39912: AMDGPU/SI: Implement d16 support for image intrinsics.

ping

Dec 20 2017, 10:48 AM

Dec 15 2017

cfang updated the diff for D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.

Update based on Matt's recent comments. Thanks.

Dec 15 2017, 1:03 PM
cfang added inline comments to D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.
Dec 15 2017, 12:17 PM
cfang updated the diff for D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.

Merge with the latest LLVM trunk.

Dec 15 2017, 9:33 AM

Dec 14 2017

cfang updated the diff for D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.
  1. Merge the patch with LLVM trunk.
  2. Update LIT tests to avoid specific registers.
Dec 14 2017, 2:09 PM

Dec 13 2017

cfang updated the diff for D39912: AMDGPU/SI: Implement d16 support for image intrinsics.
  1. Add definitions for new machine instructions for D16 bit and for whether target support packed/unpacked f16.
  2. Add a flag bit to indicate whether an instruction is referencing 16-bit half typed data (D16).
  3. Update to the LIT test cases to avoid using specific registers.
Dec 13 2017, 11:50 AM

Nov 22 2017

cfang added a comment to D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.
Nov 22 2017, 1:31 PM

Nov 16 2017

cfang added inline comments to D39912: AMDGPU/SI: Implement d16 support for image intrinsics.
Nov 16 2017, 11:03 AM

Nov 14 2017

cfang updated subscribers of D39912: AMDGPU/SI: Implement d16 support for image intrinsics.
Nov 14 2017, 8:48 AM
cfang updated the diff for D39912: AMDGPU/SI: Implement d16 support for image intrinsics.

Add LIT tests.

Nov 14 2017, 8:41 AM

Nov 10 2017

cfang added a comment to D39912: AMDGPU/SI: Implement d16 support for image intrinsics.

Pardon my ignorance, but why isn't include/llvm/IR/IntrinsicsAMDGCN.td being updated?

Nov 10 2017, 12:34 PM
cfang created D39912: AMDGPU/SI: Implement d16 support for image intrinsics.
Nov 10 2017, 11:13 AM

Nov 7 2017

cfang updated the diff for D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.

Update based on Matt's comments.

Nov 7 2017, 11:37 AM
cfang added inline comments to D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.
Nov 7 2017, 11:36 AM

Nov 6 2017

cfang added inline comments to D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.
Nov 6 2017, 9:54 AM

Nov 3 2017

cfang added a reviewer for D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics: b-sumner.
Nov 3 2017, 3:40 PM
cfang updated the diff for D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.
  1. Define a new feature, HasPackedD16VMem for gfx8.1 and beyond to guard the generation of VMem instructions with D16 bit set.
  2. Put buffer_store intrinsics implementation in the same patch as buffer_loads;
Nov 3 2017, 3:38 PM

Oct 27 2017

cfang updated the diff for D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.

Implement getTgtMemIntrinsic for buffer_load/tbuffer_load intrinsics.
Update based on Matt's review comments.

Oct 27 2017, 10:06 AM

Oct 24 2017

cfang added inline comments to D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics.
Oct 24 2017, 1:05 PM