Page MenuHomePhabricator

ZhangKang (Zhang Kang)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 19 2018, 6:03 PM (43 w, 4 d)

Recent Activity

Fri, Sep 6

ZhangKang committed rGf879c6875563: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the… (authored by ZhangKang).
[CodeGen] Do the Simple Early Return in block-placement pass to optimize the…
Fri, Sep 6, 1:16 AM
ZhangKang committed rL371177: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the….
[CodeGen] Do the Simple Early Return in block-placement pass to optimize the…
Fri, Sep 6, 1:15 AM
ZhangKang closed D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.
Fri, Sep 6, 1:14 AM · Restricted Project

Tue, Sep 3

ZhangKang updated the diff for D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

Fix the error for jump table.

Tue, Sep 3, 11:31 PM · Restricted Project

Mon, Sep 2

ZhangKang committed rL370692: Request commit access for zhangkang.
Request commit access for zhangkang
Mon, Sep 2, 7:48 PM

Aug 21 2019

ZhangKang added a comment to D59403: [PowerPC] Add the support for __builtin_setrnd() in clang.

Looks like you did not commit the version (Diff 190782) that was accepted!

So introduced duplicate documents in commits (Diff 192788) .
Thanks @davezarzycki for noticing this and fix it in https://reviews.llvm.org/rL369496.

@ZhangKang Please pay more attention next time. Thanks.

Aug 21 2019, 7:38 AM · Restricted Project

Aug 17 2019

ZhangKang committed rGb3d258fc44b5: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the… (authored by ZhangKang).
[CodeGen] Do the Simple Early Return in block-placement pass to optimize the…
Aug 17 2019, 7:39 AM
ZhangKang committed rL369191: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the….
[CodeGen] Do the Simple Early Return in block-placement pass to optimize the…
Aug 17 2019, 7:36 AM

Aug 15 2019

ZhangKang committed rG2a903c0b679b: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the… (authored by ZhangKang).
[CodeGen] Do the Simple Early Return in block-placement pass to optimize the…
Aug 15 2019, 6:08 AM
ZhangKang committed rL368997: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the….
[CodeGen] Do the Simple Early Return in block-placement pass to optimize the…
Aug 15 2019, 6:04 AM

Aug 12 2019

ZhangKang committed rG2a9efbf2484d: [NFC][PowerPC] Add the test case shrink-wrap.mir and shrink-wrap.ll for PPC (authored by ZhangKang).
[NFC][PowerPC] Add the test case shrink-wrap.mir and shrink-wrap.ll for PPC
Aug 12 2019, 10:52 AM
ZhangKang committed rL368597: [NFC][PowerPC] Add the test case shrink-wrap.mir and shrink-wrap.ll for PPC.
[NFC][PowerPC] Add the test case shrink-wrap.mir and shrink-wrap.ll for PPC
Aug 12 2019, 10:52 AM
ZhangKang committed rG489efc68a572: Revert r368565: [CodeGen] Do the Simple Early Return in block-placement pass to… (authored by ZhangKang).
Revert r368565: [CodeGen] Do the Simple Early Return in block-placement pass to…
Aug 12 2019, 7:01 AM
ZhangKang committed rL368574: Revert r368565: [CodeGen] Do the Simple Early Return in block-placement pass to….
Revert r368565: [CodeGen] Do the Simple Early Return in block-placement pass to…
Aug 12 2019, 7:00 AM
ZhangKang committed rG342fb0db6d98: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the… (authored by ZhangKang).
[CodeGen] Do the Simple Early Return in block-placement pass to optimize the…
Aug 12 2019, 6:16 AM
ZhangKang committed rL368565: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the….
[CodeGen] Do the Simple Early Return in block-placement pass to optimize the…
Aug 12 2019, 6:14 AM

Aug 11 2019

ZhangKang committed rGb1a62d168f8c: [NFC][CodeGen] Use while loop instead for loop in MachineBlockPlacement… (authored by ZhangKang).
[NFC][CodeGen] Use while loop instead for loop in MachineBlockPlacement…
Aug 11 2019, 6:02 AM
ZhangKang committed rL368532: [NFC][CodeGen] Use while loop instead for loop in MachineBlockPlacement….
[NFC][CodeGen] Use while loop instead for loop in MachineBlockPlacement…
Aug 11 2019, 5:58 AM

Aug 10 2019

ZhangKang committed rG555f7495df1c: [NFC][CodeGen] Modify the PI++ to ++PI in MachineBlockPlacement… (authored by ZhangKang).
[NFC][CodeGen] Modify the PI++ to ++PI in MachineBlockPlacement…
Aug 10 2019, 9:23 AM
ZhangKang committed rL368514: [NFC][CodeGen] Modify the PI++ to ++PI in MachineBlockPlacement….
[NFC][CodeGen] Modify the PI++ to ++PI in MachineBlockPlacement…
Aug 10 2019, 9:22 AM
ZhangKang committed rG36cd84bdd9a7: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the… (authored by ZhangKang).
[CodeGen] Do the Simple Early Return in block-placement pass to optimize the…
Aug 10 2019, 3:02 AM
ZhangKang committed rL368509: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the….
[CodeGen] Do the Simple Early Return in block-placement pass to optimize the…
Aug 10 2019, 3:02 AM
ZhangKang closed D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.
Aug 10 2019, 3:02 AM · Restricted Project
ZhangKang updated the diff for D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

The patch [MBP] Disable aggressive loop rotate in plain mode has modified the test case, so I update the test case to sync the test case.

Aug 10 2019, 2:33 AM · Restricted Project

Aug 5 2019

ZhangKang added a comment to D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

@efriedma , I have updated the patch, do you have any comments?

Aug 5 2019, 6:32 PM · Restricted Project

Aug 3 2019

ZhangKang updated the diff for D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

Add the comment about post-dominator tree.

Aug 3 2019, 8:24 AM · Restricted Project
ZhangKang added a comment to D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

Can you remove the dead MachineDominatorTree *MDT; declaration?

the function MachineBlockPlacement::runOnMachineFunction will do some clean work and will never use the MachinePostDominatorTree info, and this pass don't preserve the MachinePostDominatorTree

Please add this explanation as an explicit comment in the code.

Aug 3 2019, 7:57 AM · Restricted Project

Aug 2 2019

ZhangKang updated the diff for D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

Update the MachineLoopInfo.

Aug 2 2019, 1:16 AM · Restricted Project
ZhangKang added a comment to D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

Okay, delaying some of the work to sense, since you can't really modify FunctionChain while you're iterating over it.

Any response to my other comment?

Looking over this patch again, some of the other work involved in updating various data structures isn't complete here; the MachineDominatorTree/MachinePostDominatorTree isn't updated, MachineLoopInfo isn't updated.

Aug 2 2019, 12:57 AM · Restricted Project

Aug 1 2019

ZhangKang committed rG038dd43782b0: [NFC][CodeGen] Modify the type element of TailCalls to simplify the… (authored by ZhangKang).
[NFC][CodeGen] Modify the type element of TailCalls to simplify the…
Aug 1 2019, 8:11 PM
ZhangKang committed rL367644: [NFC][CodeGen] Modify the type element of TailCalls to simplify the….
[NFC][CodeGen] Modify the type element of TailCalls to simplify the…
Aug 1 2019, 8:11 PM
ZhangKang closed D64905: [NFC][CodeGen] Modify the type element of TailCalls to simplify the dupRetToEnableTailCallOpts().
Aug 1 2019, 8:11 PM · Restricted Project
ZhangKang added inline comments to D64905: [NFC][CodeGen] Modify the type element of TailCalls to simplify the dupRetToEnableTailCallOpts().
Aug 1 2019, 8:06 PM · Restricted Project
ZhangKang updated the diff for D64905: [NFC][CodeGen] Modify the type element of TailCalls to simplify the dupRetToEnableTailCallOpts().

Use the for range loop.

Aug 1 2019, 8:06 PM · Restricted Project

Jul 31 2019

ZhangKang updated the diff for D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

Move the clean EmptyBB work out of the for loop.

Jul 31 2019, 12:19 AM · Restricted Project
ZhangKang added a comment to D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

Here, if I use F->erase(TBB), the memory leak error is still existed.

What exactly is leaking? (If you're calling erase(), it isn't the MBB itself.)

Of course, that isn't a substitute for calling FunctionChain.remove etc.

Looking over this patch again, some of the other work involved in updating various data structures isn't complete here; the MachineDominatorTree/MachinePostDominatorTree isn't updated, MachineLoopInfo isn't updated.

Jul 31 2019, 12:06 AM · Restricted Project

Jul 29 2019

ZhangKang added a comment to D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

You probably want F->erase(TBB), which both removes TBB from the list of blocks in the function, and deallocates TBB. I guess the empty block with no predecessors doesn't really matter much, in the long run, but easier to understand if the transform cleans up after itself properly.

Jul 29 2019, 9:53 PM · Restricted Project

Jul 27 2019

ZhangKang updated the diff for D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

The old patch has the memory leak error.
The new patch fix the memory leak error by using:

FunctionChain.remove(TBB);
BlockToChain.erase(TBB);

instead

F->remove(TBB);
Jul 27 2019, 2:12 AM · Restricted Project
ZhangKang retitled D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks from [PowerPC] Do the Simple Early Return in block-placement pass to optimize the blocks to [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.
Jul 27 2019, 1:51 AM · Restricted Project

Jul 25 2019

ZhangKang committed rG4e794a8bae00: Some case eror for: detected memory leaks (authored by ZhangKang).
Some case eror for: detected memory leaks
Jul 25 2019, 8:27 PM
ZhangKang committed rL367083: Some case eror for: detected memory leaks.
Some case eror for: detected memory leaks
Jul 25 2019, 8:25 PM
ZhangKang committed rG5c6101545583: [PowerPC] Do the Simple Early Return in block-placement pass to optimize the… (authored by ZhangKang).
[PowerPC] Do the Simple Early Return in block-placement pass to optimize the…
Jul 25 2019, 7:01 PM
ZhangKang committed rL367080: [PowerPC] Do the Simple Early Return in block-placement pass to optimize the….
[PowerPC] Do the Simple Early Return in block-placement pass to optimize the…
Jul 25 2019, 7:01 PM
ZhangKang closed D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.
Jul 25 2019, 7:01 PM · Restricted Project
ZhangKang retitled D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks from [PowerPC] Do the Early Return for the li and unconditional branch to [PowerPC] Do the Simple Early Return in block-placement pass to optimize the blocks.
Jul 25 2019, 6:55 PM · Restricted Project

Jul 24 2019

ZhangKang updated the diff for D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

Modify the comment and fix the typo.

Jul 24 2019, 6:57 PM · Restricted Project
ZhangKang added a comment to D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

Have updated the patch.

Jul 24 2019, 6:57 PM · Restricted Project
ZhangKang added a comment to D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

Have updated the patch.

Jul 24 2019, 6:41 AM · Restricted Project
ZhangKang updated the diff for D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

Modify the test and update the comments.

Jul 24 2019, 6:41 AM · Restricted Project

Jul 18 2019

ZhangKang committed rGca9f68e55e43: [NFC][PowerPC] Modify the test case add_cmp.ll (authored by ZhangKang).
[NFC][PowerPC] Modify the test case add_cmp.ll
Jul 18 2019, 7:25 PM
ZhangKang committed rL366526: [NFC][PowerPC] Modify the test case add_cmp.ll.
[NFC][PowerPC] Modify the test case add_cmp.ll
Jul 18 2019, 7:24 PM
ZhangKang created D64905: [NFC][CodeGen] Modify the type element of TailCalls to simplify the dupRetToEnableTailCallOpts().
Jul 18 2019, 12:44 AM · Restricted Project
ZhangKang updated the diff for D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

Modify the comments and add test.

Jul 18 2019, 12:29 AM · Restricted Project
ZhangKang added inline comments to D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.
Jul 18 2019, 12:24 AM · Restricted Project

Jul 17 2019

ZhangKang committed rG33a4336bcd15: [NFC][PowerPC] Add the test to test the pass block-placement (authored by ZhangKang).
[NFC][PowerPC] Add the test to test the pass block-placement
Jul 17 2019, 11:57 PM
ZhangKang committed rL366407: [NFC][PowerPC] Add the test to test the pass block-placement.
[NFC][PowerPC] Add the test to test the pass block-placement
Jul 17 2019, 11:57 PM

Jul 15 2019

ZhangKang added inline comments to D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.
Jul 15 2019, 7:12 PM · Restricted Project

Jul 14 2019

ZhangKang added a comment to D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

@efriedma , I have updated this patch, to avoid call the tail-duplication again after block-placement pass.
In the end of block-placement I will do the optimization for unconditonal branch, this pattern may created by the block-placement pass.

Jul 14 2019, 11:03 PM · Restricted Project
ZhangKang updated the diff for D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

This new patch will do below optimization in the end of block-placement pass:

bb.1:
  B %bb.11
Jul 14 2019, 10:57 PM · Restricted Project
ZhangKang committed rG776ac79e88dd: [NFC][PowerPC] Add the test block-placement.mir (authored by ZhangKang).
[NFC][PowerPC] Add the test block-placement.mir
Jul 14 2019, 8:59 PM
ZhangKang committed rL366037: [NFC][PowerPC] Add the test block-placement.mir.
[NFC][PowerPC] Add the test block-placement.mir
Jul 14 2019, 8:55 PM

Jul 7 2019

ZhangKang committed rG638b1a82d80f: [NFC][PowerPC] Add the test add_cmp.ll (authored by ZhangKang).
[NFC][PowerPC] Add the test add_cmp.ll
Jul 7 2019, 6:57 PM
ZhangKang committed rL365285: [NFC][PowerPC] Add the test add_cmp.ll.
[NFC][PowerPC] Add the test add_cmp.ll
Jul 7 2019, 6:56 PM

Jul 6 2019

ZhangKang added a comment to D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

The line 29 b .LBB0_10 is created after running the pass branch-folder,

I did a quick test with -print-after-all, and it looks like it's actually created by MachineBlockPlacement?

If we're going to do this transform, we should use the existing TailDup code to do it, not reimplement it in PPCEarlyReturn. Would it make sense to run a re-run the entire tail duplication pass after block placement? Or should we try to do something more targeted?

Jul 6 2019, 8:11 AM · Restricted Project

Jul 3 2019

ZhangKang added a comment to D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

This seems like simple tail duplication, which the target-independent taildup pass should handle. Can you give an example which taildup doesn't handle?

Jul 3 2019, 8:41 AM · Restricted Project

Jun 29 2019

ZhangKang updated the diff for D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.

Modify the comments.

Jun 29 2019, 2:32 AM · Restricted Project
ZhangKang created D63972: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks.
Jun 29 2019, 1:45 AM · Restricted Project

Jun 26 2019

ZhangKang committed rG490bc46541c8: [NFC][PowerPC] Improve the for loop in Early Return (authored by ZhangKang).
[NFC][PowerPC] Improve the for loop in Early Return
Jun 26 2019, 8:40 PM
ZhangKang committed rL364496: [NFC][PowerPC] Improve the for loop in Early Return.
[NFC][PowerPC] Improve the for loop in Early Return
Jun 26 2019, 8:40 PM
ZhangKang closed D63800: [NFC][PowerPC] Improve the for loop in Early Return.
Jun 26 2019, 8:40 PM · Restricted Project

Jun 25 2019

ZhangKang updated the summary of D63800: [NFC][PowerPC] Improve the for loop in Early Return.
Jun 25 2019, 6:54 PM · Restricted Project
ZhangKang created D63800: [NFC][PowerPC] Improve the for loop in Early Return.
Jun 25 2019, 6:35 PM · Restricted Project

Jun 15 2019

ZhangKang committed rG2d51adcb5714: [PowerPC] Set the innermost hot loop to align 32 bytes (authored by ZhangKang).
[PowerPC] Set the innermost hot loop to align 32 bytes
Jun 15 2019, 8:09 AM
ZhangKang committed rL363495: [PowerPC] Set the innermost hot loop to align 32 bytes.
[PowerPC] Set the innermost hot loop to align 32 bytes
Jun 15 2019, 8:07 AM
ZhangKang closed D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.
Jun 15 2019, 8:07 AM · Restricted Project
ZhangKang updated the diff for D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

This old test case will be failed for the latest code, so I have updated the test case.

Jun 15 2019, 7:55 AM · Restricted Project

Jun 13 2019

ZhangKang updated the diff for D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

Update the patch to remove the info about PGO.

Jun 13 2019, 11:46 PM · Restricted Project
ZhangKang updated the diff for D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

Modify the comments.

Jun 13 2019, 8:32 PM · Restricted Project
ZhangKang retitled D61228: [PowerPC] Set the innermost hot loop to align 32 bytes from [PowerPC] Set the innermost hot loop(from PGO) to align 32 bytes to [PowerPC] Set the innermost hot loop to align 32 bytes.
Jun 13 2019, 8:28 PM · Restricted Project
ZhangKang added a comment to D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

@nemanjai @hfinkel I have updated the patch to align to 32 bytes even if wthout PGO data.

Jun 13 2019, 8:27 AM · Restricted Project
ZhangKang updated the diff for D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

Updated the patch to loop 32 bytes for innermost hot loop even if there is no PGO data.

Jun 13 2019, 8:24 AM · Restricted Project

Jun 11 2019

ZhangKang added a comment to D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

Okay, but we just call MBB->getParent()->getFunction().hasProfileData(), where do we actually check that the loop is hot?

Also, even if we align the majority of loops, how much does that really cost us? The code-size impact could be minor compared to the perf improvement, and if so, we should just always do it. It is still true that most users don't use PGO.

Short story: yes, I agree that we should probably just do this regardless of PGO if we don't see any significant performance regressions on important benchmarks.

TL; DR;
The hotness of the loop is checked in MachineBlockPlacement::alignBlocks(). I think the concern with aligning all loops statically determined to be "hot" to 32-bytes runs the risk of a pathologically bad case such as the following:

for (int i = 0; i < HugeValue; i++) {
  // Enough instructions to make the inner loop fall one instruction past a 32-byte boundary
  for (int j = 0; j < UnpredictableValueHighlyLikelyToBeZero; j++)
    // Do something short
}

Such a case would end up with 7 nops to align the inner loop which would presumably tie up dispatch slots. All that being said, I just ran an experiment with exactly that pathological case and the performance degrades by about 1% (which may even be in the noise).

Jun 11 2019, 11:46 PM · Restricted Project

May 30 2019

ZhangKang added a comment to D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

Okay, but we just call MBB->getParent()->getFunction().hasProfileData(), where do we actually check that the loop is hot?

Also, even if we align the majority of loops, how much does that really cost us? The code-size impact could be minor compared to the perf improvement, and if so, we should just always do it. It is still true that most users don't use PGO.

Short story: yes, I agree that we should probably just do this regardless of PGO if we don't see any significant performance regressions on important benchmarks.

TL; DR;
The hotness of the loop is checked in MachineBlockPlacement::alignBlocks(). I think the concern with aligning all loops statically determined to be "hot" to 32-bytes runs the risk of a pathologically bad case such as the following:

for (int i = 0; i < HugeValue; i++) {
  // Enough instructions to make the inner loop fall one instruction past a 32-byte boundary
  for (int j = 0; j < UnpredictableValueHighlyLikelyToBeZero; j++)
    // Do something short
}

Such a case would end up with 7 nops to align the inner loop which would presumably tie up dispatch slots. All that being said, I just ran an experiment with exactly that pathological case and the performance degrades by about 1% (which may even be in the noise).

May 30 2019, 7:09 PM · Restricted Project

May 16 2019

ZhangKang added a comment to D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

@hfinkel , do you have any other comments?

May 16 2019, 1:04 AM · Restricted Project

May 2 2019

ZhangKang committed rG1a0d6d689923: [NFC][PowerPC] Return early if the element type is not byte-sized in… (authored by ZhangKang).
[NFC][PowerPC] Return early if the element type is not byte-sized in…
May 2 2019, 1:13 AM
ZhangKang committed rL359764: [NFC][PowerPC] Return early if the element type is not byte-sized in….
[NFC][PowerPC] Return early if the element type is not byte-sized in…
May 2 2019, 1:13 AM
ZhangKang closed D61076: [NFC][PowerPC] Return early if the element type is not byte-sized in combineBVOfConsecutiveLoads.
May 2 2019, 1:13 AM · Restricted Project

Apr 29 2019

ZhangKang committed rGd43b66b3187c: [NFC][PowerPC] Use -check-prefixes to simplify the check in code-align.ll (authored by ZhangKang).
[NFC][PowerPC] Use -check-prefixes to simplify the check in code-align.ll
Apr 29 2019, 8:38 PM
ZhangKang committed rL359533: [NFC][PowerPC] Use -check-prefixes to simplify the check in code-align.ll.
[NFC][PowerPC] Use -check-prefixes to simplify the check in code-align.ll
Apr 29 2019, 8:37 PM
ZhangKang closed D61227: [NFC]][PowerPC] Use -check-prefixes to simplify the check in code-align.ll.
Apr 29 2019, 8:37 PM · Restricted Project
ZhangKang added a comment to D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

For some special cases, the performance can improve more than 30% after adding the patch for ppc.

Any significant regressions?

Apr 29 2019, 7:33 PM · Restricted Project
ZhangKang added a comment to D60811: [PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS().

I guess that's right, technically; "getScalarSizeInBits() / 8" will round down, and getStoreSize() will round up. But still, please fix the code so it's clear that check is happening.

Apr 29 2019, 7:00 PM · Restricted Project

Apr 27 2019

ZhangKang created D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.
Apr 27 2019, 9:44 AM · Restricted Project
ZhangKang updated the summary of D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.
Apr 27 2019, 9:44 AM · Restricted Project
ZhangKang created D61227: [NFC]][PowerPC] Use -check-prefixes to simplify the check in code-align.ll.
Apr 27 2019, 9:16 AM · Restricted Project

Apr 24 2019

ZhangKang created D61076: [NFC][PowerPC] Return early if the element type is not byte-sized in combineBVOfConsecutiveLoads.
Apr 24 2019, 9:50 AM · Restricted Project

Apr 18 2019

ZhangKang committed rG009a21d2fdff: [PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS() (authored by ZhangKang).
[PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS()
Apr 18 2019, 12:28 AM
ZhangKang committed rL358644: [PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS().
[PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS()
Apr 18 2019, 12:22 AM
ZhangKang closed D60811: [PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS().
Apr 18 2019, 12:22 AM · Restricted Project

Apr 16 2019

ZhangKang added a comment to D60564: Changes for LLVM PPCISelLowering function combineBVOfConsecutiveLoads.

This path has been abandoned, the new patch for this bug is in https://reviews.llvm.org/D60811.

Apr 16 2019, 9:53 PM · Restricted Project