Page MenuHomePhabricator

ZhangKang (Zhang Kang)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 19 2018, 6:03 PM (31 w, 1 h)

Recent Activity

Sat, Jun 15

ZhangKang committed rG2d51adcb5714: [PowerPC] Set the innermost hot loop to align 32 bytes (authored by ZhangKang).
[PowerPC] Set the innermost hot loop to align 32 bytes
Sat, Jun 15, 8:09 AM
ZhangKang committed rL363495: [PowerPC] Set the innermost hot loop to align 32 bytes.
[PowerPC] Set the innermost hot loop to align 32 bytes
Sat, Jun 15, 8:07 AM
ZhangKang closed D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.
Sat, Jun 15, 8:07 AM · Restricted Project
ZhangKang updated the diff for D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

This old test case will be failed for the latest code, so I have updated the test case.

Sat, Jun 15, 7:55 AM · Restricted Project

Thu, Jun 13

ZhangKang updated the diff for D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

Update the patch to remove the info about PGO.

Thu, Jun 13, 11:46 PM · Restricted Project
ZhangKang updated the diff for D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

Modify the comments.

Thu, Jun 13, 8:32 PM · Restricted Project
ZhangKang retitled D61228: [PowerPC] Set the innermost hot loop to align 32 bytes from [PowerPC] Set the innermost hot loop(from PGO) to align 32 bytes to [PowerPC] Set the innermost hot loop to align 32 bytes.
Thu, Jun 13, 8:28 PM · Restricted Project
ZhangKang added a comment to D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

@nemanjai @hfinkel I have updated the patch to align to 32 bytes even if wthout PGO data.

Thu, Jun 13, 8:27 AM · Restricted Project
ZhangKang updated the diff for D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

Updated the patch to loop 32 bytes for innermost hot loop even if there is no PGO data.

Thu, Jun 13, 8:24 AM · Restricted Project

Tue, Jun 11

ZhangKang added a comment to D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

Okay, but we just call MBB->getParent()->getFunction().hasProfileData(), where do we actually check that the loop is hot?

Also, even if we align the majority of loops, how much does that really cost us? The code-size impact could be minor compared to the perf improvement, and if so, we should just always do it. It is still true that most users don't use PGO.

Short story: yes, I agree that we should probably just do this regardless of PGO if we don't see any significant performance regressions on important benchmarks.

TL; DR;
The hotness of the loop is checked in MachineBlockPlacement::alignBlocks(). I think the concern with aligning all loops statically determined to be "hot" to 32-bytes runs the risk of a pathologically bad case such as the following:

for (int i = 0; i < HugeValue; i++) {
  // Enough instructions to make the inner loop fall one instruction past a 32-byte boundary
  for (int j = 0; j < UnpredictableValueHighlyLikelyToBeZero; j++)
    // Do something short
}

Such a case would end up with 7 nops to align the inner loop which would presumably tie up dispatch slots. All that being said, I just ran an experiment with exactly that pathological case and the performance degrades by about 1% (which may even be in the noise).

Tue, Jun 11, 11:46 PM · Restricted Project

Thu, May 30

ZhangKang added a comment to D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

Okay, but we just call MBB->getParent()->getFunction().hasProfileData(), where do we actually check that the loop is hot?

Also, even if we align the majority of loops, how much does that really cost us? The code-size impact could be minor compared to the perf improvement, and if so, we should just always do it. It is still true that most users don't use PGO.

Short story: yes, I agree that we should probably just do this regardless of PGO if we don't see any significant performance regressions on important benchmarks.

TL; DR;
The hotness of the loop is checked in MachineBlockPlacement::alignBlocks(). I think the concern with aligning all loops statically determined to be "hot" to 32-bytes runs the risk of a pathologically bad case such as the following:

for (int i = 0; i < HugeValue; i++) {
  // Enough instructions to make the inner loop fall one instruction past a 32-byte boundary
  for (int j = 0; j < UnpredictableValueHighlyLikelyToBeZero; j++)
    // Do something short
}

Such a case would end up with 7 nops to align the inner loop which would presumably tie up dispatch slots. All that being said, I just ran an experiment with exactly that pathological case and the performance degrades by about 1% (which may even be in the noise).

Thu, May 30, 7:09 PM · Restricted Project

May 16 2019

ZhangKang added a comment to D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

@hfinkel , do you have any other comments?

May 16 2019, 1:04 AM · Restricted Project

May 2 2019

ZhangKang committed rG1a0d6d689923: [NFC][PowerPC] Return early if the element type is not byte-sized in… (authored by ZhangKang).
[NFC][PowerPC] Return early if the element type is not byte-sized in…
May 2 2019, 1:13 AM
ZhangKang committed rL359764: [NFC][PowerPC] Return early if the element type is not byte-sized in….
[NFC][PowerPC] Return early if the element type is not byte-sized in…
May 2 2019, 1:13 AM
ZhangKang closed D61076: [NFC][PowerPC] Return early if the element type is not byte-sized in combineBVOfConsecutiveLoads.
May 2 2019, 1:13 AM · Restricted Project

Apr 29 2019

ZhangKang committed rGd43b66b3187c: [NFC][PowerPC] Use -check-prefixes to simplify the check in code-align.ll (authored by ZhangKang).
[NFC][PowerPC] Use -check-prefixes to simplify the check in code-align.ll
Apr 29 2019, 8:38 PM
ZhangKang committed rL359533: [NFC][PowerPC] Use -check-prefixes to simplify the check in code-align.ll.
[NFC][PowerPC] Use -check-prefixes to simplify the check in code-align.ll
Apr 29 2019, 8:37 PM
ZhangKang closed D61227: [NFC]][PowerPC] Use -check-prefixes to simplify the check in code-align.ll.
Apr 29 2019, 8:37 PM · Restricted Project
ZhangKang added a comment to D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

For some special cases, the performance can improve more than 30% after adding the patch for ppc.

Any significant regressions?

Apr 29 2019, 7:33 PM · Restricted Project
ZhangKang added a comment to D60811: [PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS().

I guess that's right, technically; "getScalarSizeInBits() / 8" will round down, and getStoreSize() will round up. But still, please fix the code so it's clear that check is happening.

Apr 29 2019, 7:00 PM · Restricted Project

Apr 27 2019

ZhangKang created D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.
Apr 27 2019, 9:44 AM · Restricted Project
ZhangKang updated the summary of D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.
Apr 27 2019, 9:44 AM · Restricted Project
ZhangKang created D61227: [NFC]][PowerPC] Use -check-prefixes to simplify the check in code-align.ll.
Apr 27 2019, 9:16 AM · Restricted Project

Apr 24 2019

ZhangKang created D61076: [NFC][PowerPC] Return early if the element type is not byte-sized in combineBVOfConsecutiveLoads.
Apr 24 2019, 9:50 AM · Restricted Project

Apr 18 2019

ZhangKang committed rG009a21d2fdff: [PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS() (authored by ZhangKang).
[PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS()
Apr 18 2019, 12:28 AM
ZhangKang committed rL358644: [PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS().
[PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS()
Apr 18 2019, 12:22 AM
ZhangKang closed D60811: [PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS().
Apr 18 2019, 12:22 AM · Restricted Project

Apr 16 2019

ZhangKang added a comment to D60564: Changes for LLVM PPCISelLowering function combineBVOfConsecutiveLoads.

This path has been abandoned, the new patch for this bug is in https://reviews.llvm.org/D60811.

Apr 16 2019, 9:53 PM · Restricted Project
ZhangKang created D60811: [PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS().
Apr 16 2019, 9:47 PM · Restricted Project

Apr 12 2019

ZhangKang committed rG2446f843aeea: [PowerPC] Add initialization for some ppc passes (authored by ZhangKang).
[PowerPC] Add initialization for some ppc passes
Apr 12 2019, 3:01 AM
ZhangKang committed rL358271: [PowerPC] Add initialization for some ppc passes.
[PowerPC] Add initialization for some ppc passes
Apr 12 2019, 2:58 AM
ZhangKang updated the diff for D60248: [PowerPC] Add initialization for some ppc passes .

Add mtriple=powerpc64le-unknown-unknown for the test files.

Apr 12 2019, 2:28 AM · Restricted Project

Apr 11 2019

ZhangKang committed rG6f8f98ce8de7: [PowerPC] Add initialization for some ppc passes (authored by ZhangKang).
[PowerPC] Add initialization for some ppc passes
Apr 11 2019, 11:38 PM
ZhangKang committed rL358256: [PowerPC] Add initialization for some ppc passes.
[PowerPC] Add initialization for some ppc passes
Apr 11 2019, 11:33 PM
ZhangKang closed D60248: [PowerPC] Add initialization for some ppc passes .
Apr 11 2019, 11:33 PM · Restricted Project
ZhangKang added a comment to D60564: Changes for LLVM PPCISelLowering function combineBVOfConsecutiveLoads.

@jsji , I will talk with @sarveshtamba to check and add the test case.

Apr 11 2019, 9:34 AM · Restricted Project

Apr 3 2019

ZhangKang created D60248: [PowerPC] Add initialization for some ppc passes .
Apr 3 2019, 10:30 PM · Restricted Project

Mar 29 2019

ZhangKang committed rGe5ac385fb1ff: [PowerPC] Add the support for __builtin_setrnd() in clang (authored by ZhangKang).
[PowerPC] Add the support for __builtin_setrnd() in clang
Mar 29 2019, 2:11 AM
ZhangKang committed rC357242: [PowerPC] Add the support for __builtin_setrnd() in clang.
[PowerPC] Add the support for __builtin_setrnd() in clang
Mar 29 2019, 2:11 AM
ZhangKang committed rL357242: [PowerPC] Add the support for __builtin_setrnd() in clang.
[PowerPC] Add the support for __builtin_setrnd() in clang
Mar 29 2019, 2:10 AM
ZhangKang closed D59403: [PowerPC] Add the support for __builtin_setrnd() in clang.
Mar 29 2019, 2:10 AM · Restricted Project
ZhangKang committed rG05f78b35ae82: [PowerPC] Add the support for __builtin_setrnd() (authored by ZhangKang).
[PowerPC] Add the support for __builtin_setrnd()
Mar 29 2019, 1:46 AM
ZhangKang committed rL357241: [PowerPC] Add the support for __builtin_setrnd().
[PowerPC] Add the support for __builtin_setrnd()
Mar 29 2019, 1:44 AM
ZhangKang closed D59405: [PowerPC] Add the support for __builtin_setrnd().
Mar 29 2019, 1:43 AM · Restricted Project

Mar 28 2019

ZhangKang added inline comments to D59405: [PowerPC] Add the support for __builtin_setrnd().
Mar 28 2019, 7:18 PM · Restricted Project
ZhangKang updated the diff for D59405: [PowerPC] Add the support for __builtin_setrnd().

Modify the comments to follow reviewer's comments..

Mar 28 2019, 7:18 PM · Restricted Project

Mar 14 2019

ZhangKang added a reviewer for D59403: [PowerPC] Add the support for __builtin_setrnd() in clang: jsji.
Mar 14 2019, 8:49 PM · Restricted Project
ZhangKang removed a reviewer for D59403: [PowerPC] Add the support for __builtin_setrnd() in clang: jsji.
Mar 14 2019, 8:49 PM · Restricted Project
ZhangKang updated the diff for D59403: [PowerPC] Add the support for __builtin_setrnd() in clang.

Update the patch.

Mar 14 2019, 8:46 PM · Restricted Project
ZhangKang updated the diff for D59403: [PowerPC] Add the support for __builtin_setrnd() in clang.

Have checked the missing files.

Mar 14 2019, 8:40 PM · Restricted Project
ZhangKang updated the diff for D59405: [PowerPC] Add the support for __builtin_setrnd().

Add the test file.

Mar 14 2019, 8:33 PM · Restricted Project
ZhangKang created D59405: [PowerPC] Add the support for __builtin_setrnd().
Mar 14 2019, 8:01 PM · Restricted Project
ZhangKang created D59403: [PowerPC] Add the support for __builtin_setrnd() in clang.
Mar 14 2019, 7:31 PM · Restricted Project

Feb 24 2019

ZhangKang committed rG4faa4090c9e2: [PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction… (authored by ZhangKang).
[PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction…
Feb 24 2019, 6:46 PM
ZhangKang committed rL354762: [PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction….
[PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction…
Feb 24 2019, 6:45 PM
ZhangKang closed D58430: [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts.
Feb 24 2019, 6:45 PM · Restricted Project

Feb 23 2019

ZhangKang added a comment to D58430: [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts.

Have modifed the comments.

Feb 23 2019, 1:02 AM · Restricted Project
ZhangKang updated the diff for D58430: [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts.

Modify the comments to follow the reviewers' suggestions.

Feb 23 2019, 1:00 AM · Restricted Project

Feb 20 2019

ZhangKang added reviewers for D58430: [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts: echristo, hiraditya.
Feb 20 2019, 9:08 PM · Restricted Project
ZhangKang edited reviewers for D58430: [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts, added: hfinkel; removed: llvm-commits, power-llvm-team, echristo, hiraditya.
Feb 20 2019, 1:55 AM · Restricted Project
ZhangKang created D58430: [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts.
Feb 20 2019, 1:07 AM · Restricted Project

Dec 30 2018

quangthong81 awarded rL350165: [PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that… a Like token.
Dec 30 2018, 8:17 AM
ZhangKang committed rL350165: [PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that….
[PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that…
Dec 30 2018, 7:17 AM
ZhangKang closed D56148: [PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that bad machine code.
Dec 30 2018, 7:17 AM
ZhangKang updated the diff for D56148: [PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that bad machine code.

Put if (!isPatchPoint) within if (isSVR4ABI && isPPC64) to follow nemanjai's suggestion.

Dec 30 2018, 1:27 AM
ZhangKang added inline comments to D56148: [PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that bad machine code.
Dec 30 2018, 1:27 AM
ZhangKang added a comment to D56148: [PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that bad machine code.

So this rell-request

typo in the summary "rell"

Dec 30 2018, 12:28 AM
ZhangKang updated the summary of D56148: [PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that bad machine code.
Dec 30 2018, 12:28 AM

Dec 29 2018

ZhangKang committed rL350161: [PowerPC] Fix ADDE, SUBE do not know how to promote operator.
[PowerPC] Fix ADDE, SUBE do not know how to promote operator
Dec 29 2018, 11:51 PM
ZhangKang closed D56119: [PowerPC] Fix ADDE, SUBE do not know how to promote operator.
Dec 29 2018, 11:51 PM
ZhangKang updated the summary of D56119: [PowerPC] Fix ADDE, SUBE do not know how to promote operator.
Dec 29 2018, 11:16 PM
ZhangKang created D56148: [PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that bad machine code.
Dec 29 2018, 6:36 AM
ZhangKang added inline comments to D56119: [PowerPC] Fix ADDE, SUBE do not know how to promote operator.
Dec 29 2018, 5:54 AM
ZhangKang updated the diff for D56119: [PowerPC] Fix ADDE, SUBE do not know how to promote operator.

Modify the spelling erros.

Dec 29 2018, 5:52 AM

Dec 28 2018

ZhangKang added inline comments to D56119: [PowerPC] Fix ADDE, SUBE do not know how to promote operator.
Dec 28 2018, 9:23 AM
ZhangKang updated the diff for D56119: [PowerPC] Fix ADDE, SUBE do not know how to promote operator.

Add the comments for PromoteIntRes_ADDSUBCARRY.

Dec 28 2018, 9:22 AM

Dec 27 2018

ZhangKang added reviewers for D56119: [PowerPC] Fix ADDE, SUBE do not know how to promote operator: efriedma, bogner.
Dec 27 2018, 11:07 PM
ZhangKang created D56119: [PowerPC] Fix ADDE, SUBE do not know how to promote operator.
Dec 27 2018, 10:21 PM

Dec 24 2018

ZhangKang committed rL350061: [PowerPC] Fix the bug of ISD::ADDE to set its second return type to glue.
[PowerPC] Fix the bug of ISD::ADDE to set its second return type to glue
Dec 24 2018, 7:33 PM
ZhangKang closed D55977: [PowerPC] Fix the bug of ISD::ADDE to set its second return type to glue.
Dec 24 2018, 7:33 PM

Dec 21 2018

ZhangKang added a comment to D55977: [PowerPC] Fix the bug of ISD::ADDE to set its second return type to glue.

LGTM

Also, if there's a way to have a non-asserts-required test case, that would be useful. How did you notice the problem?

Dec 21 2018, 11:07 PM

Dec 20 2018

ZhangKang created D55977: [PowerPC] Fix the bug of ISD::ADDE to set its second return type to glue.
Dec 20 2018, 6:07 PM

Dec 19 2018

ZhangKang committed rL349727: [PowerPC] Implement the isSelectSupported() target hook.
[PowerPC] Implement the isSelectSupported() target hook
Dec 19 2018, 10:23 PM
ZhangKang closed D55754: [PowerPC] Implement the ”isSelectSupported()“ target hook.
Dec 19 2018, 10:23 PM

Dec 18 2018

ZhangKang added inline comments to D55754: [PowerPC] Implement the ”isSelectSupported()“ target hook.
Dec 18 2018, 6:37 PM

Dec 16 2018

ZhangKang updated the diff for D55754: [PowerPC] Implement the ”isSelectSupported()“ target hook.

Add space on the right of !=. So that the Kind !=SelectSupportKind::ScalarCondVectorVal will be modified to Kind != SelectSupportKind::ScalarCondVectorVal.

Dec 16 2018, 7:13 PM
ZhangKang created D55754: [PowerPC] Implement the ”isSelectSupported()“ target hook.
Dec 16 2018, 6:43 PM

Dec 7 2018

ZhangKang committed rL348572: [PowerPC] VSX register support for inline assembly.
[PowerPC] VSX register support for inline assembly
Dec 7 2018, 1:02 AM
ZhangKang committed rC348572: [PowerPC] VSX register support for inline assembly.
[PowerPC] VSX register support for inline assembly
Dec 7 2018, 1:02 AM
ZhangKang closed D55192: [PowerPC] VSX register support for inline assembly.
Dec 7 2018, 1:02 AM
ZhangKang updated the diff for D55192: [PowerPC] VSX register support for inline assembly.

Because the ELFv1 is different with ELFv2, so I add the guard before use GCCAddlRegNames.

Dec 7 2018, 12:06 AM

Dec 6 2018

ZhangKang updated the diff for D55192: [PowerPC] VSX register support for inline assembly.

Add the comment for the array AddlRegName to tell about how these RegNum encoding are defined.

Dec 6 2018, 9:07 AM
ZhangKang added inline comments to D55192: [PowerPC] VSX register support for inline assembly.
Dec 6 2018, 8:04 AM

Dec 5 2018

ZhangKang added inline comments to D55192: [PowerPC] VSX register support for inline assembly.
Dec 5 2018, 10:26 PM
ZhangKang added a comment to D55192: [PowerPC] VSX register support for inline assembly.

@jsji I have updated a new patch for fix the error VSX register index mapping of the old patch.

Dec 5 2018, 8:40 PM
ZhangKang updated the diff for D55192: [PowerPC] VSX register support for inline assembly.

Fix the VSX register index mappping of the array GCCAddlRegNames.

Dec 5 2018, 8:39 PM

Dec 4 2018

ZhangKang added a comment to D55192: [PowerPC] VSX register support for inline assembly.

@jsji , I have uploaded a new patch to avoid renaming from vs32 to` v0` in clobber list.

Dec 4 2018, 1:46 AM
ZhangKang updated the diff for D55192: [PowerPC] VSX register support for inline assembly.

The old patch will rename from vs32 to` v0` in clobber list, that‘s is not very reasonable.
The new patch will override getGCCAddlRegNames to map vs* to reg numbers, this can avoid unnecessary renaming from vs32 to` v0` in clobber list.

Dec 4 2018, 1:37 AM

Dec 3 2018

ZhangKang created D55192: [PowerPC] VSX register support for inline assembly.
Dec 3 2018, 2:26 AM

Dec 2 2018

ZhangKang committed rL348109: [PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction.
[PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction
Dec 2 2018, 7:36 PM