Page MenuHomePhabricator

cycheng (Chuang-Yu Cheng)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 16 2015, 4:26 AM (204 w, 2 d)

Recent Activity

Sep 2 2016

cycheng added inline comments to D23155: Power9 - Part-word VSX integer scalar loads/stores and sign extend instructions.
Sep 2 2016, 4:06 AM · Restricted Project

Aug 16 2016

cycheng closed D23441: [ppc64] Don't apply sibling call optimization if callee has any byval arg.

Committed: r278900

Aug 16 2016, 8:29 PM
cycheng committed rL278900: [ppc64] Don't apply sibling call optimization if callee has any byval arg.
[ppc64] Don't apply sibling call optimization if callee has any byval arg
Aug 16 2016, 8:25 PM

Aug 12 2016

cycheng added a comment to D23441: [ppc64] Don't apply sibling call optimization if callee has any byval arg.

Note that gcc is able to do SCO when caller uses more stack space than callee. Please look at this example:

Aug 12 2016, 2:37 AM
cycheng retitled D23441: [ppc64] Don't apply sibling call optimization if callee has any byval arg from to [ppc64] Don't apply sibling call optimization if callee has any byval arg.
Aug 12 2016, 2:17 AM

Aug 4 2016

cycheng added a reviewer for D23155: Power9 - Part-word VSX integer scalar loads/stores and sign extend instructions: tjablin.

Add Tom.

Aug 4 2016, 5:53 PM · Restricted Project

Jul 28 2016

cycheng added a comment to D20949: [AggressiveAntiDepBreaker] Don't change aliased registers of liveins to alive in StartBlock.

Hello everyone, any comments on this patch?

Jul 28 2016, 7:22 PM
cycheng added a comment to D20310: Teach LLVM about Power 9 D-Form VSX Instructions.

Does anybody have any comment on this patch?
Thanks!

Jul 28 2016, 7:21 PM
cycheng added a comment to D22194: Power9 - Add exploitation of oris/ori fusion.

Hi Ehsan,

Jul 28 2016, 7:20 PM
cycheng added a comment to D22193: Add macro-fusion hook in MIScheduler and support cluster instructions scheduling in PostRAScheduler.

For two of the three files that I checked, the changes here appear in D22194 as well. Did you mean to abandon this patch and use D22194 instead?

Jul 28 2016, 7:11 PM

Jul 15 2016

cycheng updated the diff for D22194: Power9 - Add exploitation of oris/ori fusion.
  • Fix a bug when traversing fusion-table
  • Support addi/{stdx,ldx} for Pwr9
Jul 15 2016, 3:46 AM

Jul 14 2016

cycheng updated the diff for D22194: Power9 - Add exploitation of oris/ori fusion.
  • Addressed Nemanjai's comments
  • Implement a table based methodology for managing fusion instruction patterns, so we can have a cleaner way to add new patterns, query pattern, .. and so on.
Jul 14 2016, 4:27 AM

Jul 13 2016

cycheng added inline comments to D22194: Power9 - Add exploitation of oris/ori fusion.
Jul 13 2016, 4:54 AM

Jul 11 2016

cycheng added inline comments to D22193: Add macro-fusion hook in MIScheduler and support cluster instructions scheduling in PostRAScheduler.
Jul 11 2016, 6:30 PM
cycheng added a comment to D22194: Power9 - Add exploitation of oris/ori fusion.

Can you explain why you need a custom mutator here? At first glance it looks like overriding TargetInstrInfo::shouldScheduleAdjacent() and using the default MacroFusion mutator is enough.

Jul 11 2016, 6:16 PM

Jul 10 2016

cycheng added a parent revision for D22194: Power9 - Add exploitation of oris/ori fusion: D22193: Add macro-fusion hook in MIScheduler and support cluster instructions scheduling in PostRAScheduler.
Jul 10 2016, 1:50 AM
cycheng added a child revision for D22193: Add macro-fusion hook in MIScheduler and support cluster instructions scheduling in PostRAScheduler: D22194: Power9 - Add exploitation of oris/ori fusion.
Jul 10 2016, 1:50 AM
cycheng retitled D22194: Power9 - Add exploitation of oris/ori fusion from to Power9 - Add exploitation of oris/ori fusion.
Jul 10 2016, 1:50 AM
cycheng retitled D22193: Add macro-fusion hook in MIScheduler and support cluster instructions scheduling in PostRAScheduler from to Add macro-fusion hook in MIScheduler and support cluster instructions scheduling in PostRAScheduler.
Jul 10 2016, 1:36 AM

Jun 23 2016

cycheng closed D21397: Teach SimplifyCFG to Create Switches from InstCombine Or Mask'd Comparisons.

Committed r273639 (On behalf of Tom)

Jun 23 2016, 7:07 PM
cycheng committed rL273639: Teaching SimplifyCFG to recognize the Or-Mask trick that InstCombine uses to.
Teaching SimplifyCFG to recognize the Or-Mask trick that InstCombine uses to
Jun 23 2016, 7:06 PM

Jun 22 2016

cycheng updated the diff for D20949: [AggressiveAntiDepBreaker] Don't change aliased registers of liveins to alive in StartBlock.

Add explanation in StartBlock to address Hal's review comment.

Jun 22 2016, 12:31 AM

Jun 16 2016

cycheng closed D21440: Use m_APInt in SimplifyCFG.

Committed r272977 (On behalf of Tom)

Jun 16 2016, 5:12 PM
cycheng committed rL272977: Use m_APInt in SimplifyCFG.
Use m_APInt in SimplifyCFG
Jun 16 2016, 5:11 PM
cycheng closed D21417: Fix Side-Conditions in SimplifyCFG for Creating Switches from InstCombine And Mask'd Comparisons.

Committed r272873 (On behalf of Tom)

Jun 16 2016, 1:45 AM

Jun 15 2016

cycheng committed rL272873: SimplifyCFG is able to detect the pattern:.
SimplifyCFG is able to detect the pattern:
Jun 15 2016, 9:51 PM
cycheng added a comment to D20310: Teach LLVM about Power 9 D-Form VSX Instructions.

As we discussed, before you commit the change, please add -verify-machineinstrs to your regression tests. No need to upload the patch again. Thanks.

Jun 15 2016, 3:39 AM

Jun 14 2016

cycheng added a comment to D20949: [AggressiveAntiDepBreaker] Don't change aliased registers of liveins to alive in StartBlock.

[Discussion in mail]

Jun 14 2016, 3:17 AM
cycheng added a comment to D20949: [AggressiveAntiDepBreaker] Don't change aliased registers of liveins to alive in StartBlock.

[Discussion in mail]

Jun 14 2016, 2:57 AM
cycheng added a comment to D20310: Teach LLVM about Power 9 D-Form VSX Instructions.

Eliminating VSHRC brought up a new issue for me, but I have fixed it. Tom will upload the new patch later (the patch passed all of my testing on Pwr8).

Jun 14 2016, 2:35 AM

Jun 8 2016

cycheng added a comment to D20310: Teach LLVM about Power 9 D-Form VSX Instructions.

Yes, except that I'm not sure that we want to remove the register classes, just the register definitions themselves. This makes the change smaller, and also does not force us to add VSX-only data types to the Altivec register classes. In short, add VR0-31 directly to VSHRC. Does that make sense? Then you'll need some, but not all, of the changes you outline below.

Thanks again,
Hal

Jun 8 2016, 9:52 AM

Jun 4 2016

cycheng updated the diff for D20949: [AggressiveAntiDepBreaker] Don't change aliased registers of liveins to alive in StartBlock.

Fix test case to create correct live-in registers for next.bb that can demonstrate the issue.

Jun 4 2016, 8:38 PM

Jun 3 2016

cycheng retitled D20949: [AggressiveAntiDepBreaker] Don't change aliased registers of liveins to alive in StartBlock from to [AggressiveAntiDepBreaker] Don't change aliased registers of liveins to alive in StartBlock.
Jun 3 2016, 1:17 AM

Jun 1 2016

cycheng added a comment to D20310: Teach LLVM about Power 9 D-Form VSX Instructions.

Thanks for working on this. I'd really like to see a unified solution here, both for this and for the high half of the VSX register file in general (i.e. using this same scheme to eliminate the VSRH registers).

Jun 1 2016, 12:58 AM

May 20 2016

cycheng added a comment to D20310: Teach LLVM about Power 9 D-Form VSX Instructions.

Unfortunately, this still isn't quite the correct semantics. Although this will target the right physical registers, the encoding is wrong. These really are VR registers and have 5-bit fields in the encoding. Things like:

lxsd 35, 8(3)

are not likely to produce the desired results. These instructions need the VR register to be specified in the 0-31 range which will actually mean VSR 32-63.
As far as I can tell, the idea with these instructions is that we get scalar floating point values using the nice D-Form loads into the remaining VSR's (the FP D-Form loads can be used for VSRs 0-31).

I think perhaps the best way to handle these would be to define a new register class which will alias the VRRC registers, but has 64-bit spill size and can hold f64/f32.

May 20 2016, 4:59 PM
cycheng added a comment to D20310: Teach LLVM about Power 9 D-Form VSX Instructions.

This is going to need some additional work to restrict the register sets for all the instructions. Of course, these are scalar loads/stores but they're restricted to the upper 32 VSX registers (the VMX registers) so we can't use the full vsfrc/vssrc register classes.

May 20 2016, 4:08 AM
cycheng added a reviewer for D20310: Teach LLVM about Power 9 D-Form VSX Instructions: amehsan.

+ Ehsan

May 20 2016, 1:33 AM

May 19 2016

cycheng added inline comments to D19825: Power9 - Add exploitation of vector load and store that do not require swaps.
May 19 2016, 8:54 PM
cycheng added a comment to D20019: [PPC] exploitation of new xscmp*, as well as xsmaxcdp and xsmincdp.
Group1 Testcases:
define {double|float} @{max|min}_test{1|2}{_float}(%x, %y) #1
define {double|float} @{max|min}_test{1|2}{_float}_eq(%x, %y) #1
define {double|float} @fast_{max|min}_test{1|2}{_float}(%x, %y) #2
define {double|float} @fast_{max|min}_test{1|2}{_float}_eq(%x, %y) #2
Total: 8*4 = 32
May 19 2016, 2:54 AM

May 18 2016

cycheng abandoned D18030: [ppc64] Create instruction reorder chances in prologue and epilogue for post-RA-scheduler.

Abandon this patch since prologue/epilogue instruction order is not a major performance issue.

May 18 2016, 5:30 PM
cycheng abandoned D20092: [AMDGPU] Fix issues introduced by aggressive block placement.

Because D20017 has been abandoned, so abandon this patch, too.
We might need this patch again if "-force-precise-rotation-cost" turn on by default, and do loop rotation on this pattern:

May 18 2016, 5:19 PM
cycheng abandoned D20017: Aggressive choosing best loop top.

Abandon this patch because there is already a mature mechanism: "Precise (Loop) Rotation Cost", I should rely on it.
Thanks for all of your help.

May 18 2016, 5:14 PM
cycheng added inline comments to D20019: [PPC] exploitation of new xscmp*, as well as xsmaxcdp and xsmincdp.
May 18 2016, 7:45 AM

May 12 2016

cycheng added a comment to D20017: Aggressive choosing best loop top.

I can test on PowerPC and x86, but I have no idea on other backend : (

May 12 2016, 8:46 AM
cycheng planned changes to D20017: Aggressive choosing best loop top.

Yes, -force-precise-rotation-cost=true solve this issue, and the result looks better then this patch. I hope we can enable it by default.

May 12 2016, 8:27 AM

May 10 2016

cycheng closed D19564: Update Debug Intrinsics in RewriteUsesOfClonedInstructions in LoopRotation.

Committed r269034
On behalf of Tom.

May 10 2016, 6:40 PM
cycheng updated the diff for D20017: Aggressive choosing best loop top.

Fix AMDGPU test case failure

May 10 2016, 7:55 AM
cycheng added a child revision for D20092: [AMDGPU] Fix issues introduced by aggressive block placement: D20017: Aggressive choosing best loop top.
May 10 2016, 4:49 AM
cycheng added a parent revision for D20017: Aggressive choosing best loop top: D20092: [AMDGPU] Fix issues introduced by aggressive block placement.
May 10 2016, 4:49 AM
cycheng added inline comments to D20017: Aggressive choosing best loop top.
May 10 2016, 4:47 AM
cycheng committed rL269034: Update Debug Intrinsics in RewriteUsesOfClonedInstructions in LoopRotation.
Update Debug Intrinsics in RewriteUsesOfClonedInstructions in LoopRotation
May 10 2016, 2:51 AM
cycheng retitled D20092: [AMDGPU] Fix issues introduced by aggressive block placement from to [AMDGPU] Fix issues introduced by aggressive block placement.
May 10 2016, 2:04 AM

May 6 2016

cycheng retitled D20017: Aggressive choosing best loop top from to Aggressive choosing best loop top.
May 6 2016, 6:06 AM

Apr 26 2016

cycheng closed D19255: [ppc64] fix bug in prologue that mfocrf's cr operand should be explict state instead of implicit.

Committed r267660

Apr 26 2016, 8:05 PM
cycheng committed rL267660: [ppc64] fix bug in prologue that mfocrf's cr operand should be explict state….
[ppc64] fix bug in prologue that mfocrf's cr operand should be explict state…
Apr 26 2016, 8:05 PM
cycheng updated the diff for D19255: [ppc64] fix bug in prologue that mfocrf's cr operand should be explict state instead of implicit.

Update testcase

Apr 26 2016, 7:33 PM
cycheng closed D19316: [ppc64] Reenable sibling call optimization on ppc64 since fixed tsan library tail-call issue.

Committed r267527

Apr 26 2016, 1:37 AM
cycheng added a comment to D19255: [ppc64] fix bug in prologue that mfocrf's cr operand should be explict state instead of implicit.

Can you also add the testcase mentioned in the bug as a unit test?

Apr 26 2016, 1:36 AM
cycheng committed rL267527: [ppc64] Reenable sibling call optimization on ppc64 since fixed tsan library….
[ppc64] Reenable sibling call optimization on ppc64 since fixed tsan library…
Apr 26 2016, 12:44 AM

Apr 20 2016

cycheng retitled D19316: [ppc64] Reenable sibling call optimization on ppc64 since fixed tsan library tail-call issue from to [ppc64] Reenable sibling call optimization on ppc64 since fixed tsan library tail-call issue.
Apr 20 2016, 4:07 AM
cycheng closed D19148: Always inlining PrintCurrentStackSlow of tsan library to fix tail-call issue.

Committed r266869

Apr 20 2016, 3:35 AM
cycheng committed rL266869: Always inlining PrintCurrentStackSlow of tsan library to fix tail-call issue.
Always inlining PrintCurrentStackSlow of tsan library to fix tail-call issue
Apr 20 2016, 3:34 AM
cycheng updated the diff for D19148: Always inlining PrintCurrentStackSlow of tsan library to fix tail-call issue.
Apr 20 2016, 3:22 AM
cycheng added a comment to D19148: Always inlining PrintCurrentStackSlow of tsan library to fix tail-call issue.

It's all messy. The real problem is that sanitizer_print_stack_trace obtains current PC and expects the PC to be in the stack trace after function calls. We don't prevent tail calls in sanitizer runtimes, so this assumption does not necessary hold. For example, even if we add the ALWAYS_INLINE but introduce another function between sanitizer_print_stack_trace and PrintCurrentStackSlow (sanitizer_print_stack_trace calls Foo and Foo calls PrintCurrentStackSlow), then the test will become flaky again (sanitizer_print_stack_trace tail calls Foo and the PC disappears from the stack trace).
Another problem is that asan/msan does not turn off tail calls during compilation, so __sanitizer_print_stack_trace can be tail called, and then we get broken stack trace again.

Having said that, I don't see any simple, reliable solution. And anything complex probably does not worth it. So let's submit this fix.

Thanks for fixing this, btw.

Apr 20 2016, 1:56 AM
cycheng updated the diff for D19148: Always inlining PrintCurrentStackSlow of tsan library to fix tail-call issue.

Rename test file name.

Apr 20 2016, 1:25 AM
cycheng added a comment to D19148: Always inlining PrintCurrentStackSlow of tsan library to fix tail-call issue.
In D19148#405951, @kcc wrote:

Maybe you should instead inhibit the tail call optimization (by e.g. adding some dummy code after the call)?
Also, is a test possible for this change?

Apr 20 2016, 1:19 AM
cycheng updated the diff for D19148: Always inlining PrintCurrentStackSlow of tsan library to fix tail-call issue.

Add test case to check whether PrintCurrentStackSlow is inlined as our expectation.

Apr 20 2016, 1:12 AM

Apr 19 2016

cycheng updated the diff for D19148: Always inlining PrintCurrentStackSlow of tsan library to fix tail-call issue.

I just found there is ALWAYS_INLINE be defined in "sanitizer_internal_defs.h", use that instead of LLVM_ALWAYS_INLINE to avoid introducing llvm dependency.

Apr 19 2016, 1:00 AM

Apr 18 2016

cycheng retitled D19255: [ppc64] fix bug in prologue that mfocrf's cr operand should be explict state instead of implicit from to [ppc64] fix bug in prologue that mfocrf's cr operand should be explict state instead of implicit.
Apr 18 2016, 10:05 PM

Apr 14 2016

cycheng added a comment to D19148: Always inlining PrintCurrentStackSlow of tsan library to fix tail-call issue.

This patch replace D19001.

Apr 14 2016, 10:13 PM
cycheng abandoned D19001: [ppc64] Disable sibling-call-optimization when building tsan library by clang.

I submit a new patch http://reviews.llvm.org/D19148 to replace this one.
Thanks Hal!

Apr 14 2016, 10:06 PM
cycheng retitled D19148: Always inlining PrintCurrentStackSlow of tsan library to fix tail-call issue from to Always inlining PrintCurrentStackSlow of tsan library to fix tail-call issue.
Apr 14 2016, 10:04 PM
cycheng added a comment to D19001: [ppc64] Disable sibling-call-optimization when building tsan library by clang.

Agree!

  1. First, I would like to update my investigations:
    • Current mechanism of "BufferedStackTrace::LocatePcInTrace and MatchPc" is unable to handle tail-call to "__tsan::PrintCurrentStackSlow" case.
    • I used clang 3.9.0 on x86 to build tsan library, of course it tail-call to "__tsan::PrintCurrentStackSlow", and I found it passed print-stack-trace.cc test was because of lucky. see: https://llvm.org/bugs/show_bug.cgi?id=27280#c3
Apr 14 2016, 3:57 AM

Apr 12 2016

cycheng committed rL266166: [PPC64][VSX] Add a couple of new data types for vec_vsx_ld and vec_vsx_st….
[PPC64][VSX] Add a couple of new data types for vec_vsx_ld and vec_vsx_st…
Apr 12 2016, 10:22 PM
cycheng added a comment to D19001: [ppc64] Disable sibling-call-optimization when building tsan library by clang.

Hi Hal,

Apr 12 2016, 3:36 AM

Apr 11 2016

cycheng closed D17749: [PPC64] Use mfocrf in prologue when we only need to save 1 nonvolatile CR field.

Committed r266038.

Apr 11 2016, 8:20 PM
cycheng closed D18884: Mark CR0 Live if PPCInstrInfo::optimizeCompareInstr Creates a Use of CR0.

Committed r266040
(On behalf of Tom)

Apr 11 2016, 8:18 PM
cycheng committed rL266040: [PPC64] Mark CR0 Live if PPCInstrInfo::optimizeCompareInstr Creates a Use of CR0.
[PPC64] Mark CR0 Live if PPCInstrInfo::optimizeCompareInstr Creates a Use of CR0
Apr 11 2016, 8:16 PM
cycheng committed rL266038: [PPC64] Use mfocrf in prologue when we only need to save 1 nonvolatile CR field.
[PPC64] Use mfocrf in prologue when we only need to save 1 nonvolatile CR field
Apr 11 2016, 8:10 PM
cycheng retitled D19001: [ppc64] Disable sibling-call-optimization when building tsan library by clang from to [ppc64] Disable sibling-call-optimization when building tsan library by clang.
Apr 11 2016, 6:53 PM

Apr 8 2016

cycheng closed D17533: CXX_FAST_TLS calling convention: performance improvement for PPC64.

Committed r265781
(On behalf of Tom)

Apr 8 2016, 5:11 AM
cycheng committed rL265781: CXX_FAST_TLS calling convention: performance improvement for PPC64.
CXX_FAST_TLS calling convention: performance improvement for PPC64
Apr 8 2016, 5:11 AM

Apr 6 2016

cycheng planned changes to D18030: [ppc64] Create instruction reorder chances in prologue and epilogue for post-RA-scheduler.

No, please stop reviewing this patch.

Apr 6 2016, 6:31 PM
cycheng updated D18030: [ppc64] Create instruction reorder chances in prologue and epilogue for post-RA-scheduler.
Apr 6 2016, 4:39 AM
cycheng committed rL265528: [ppc64] Temporary disable sibling call optimization on ppc64 due to breaking….
[ppc64] Temporary disable sibling call optimization on ppc64 due to breaking…
Apr 6 2016, 3:54 AM

Apr 5 2016

cycheng closed D16315: [ppc64] Enable sibling call optimization on ppc64 ELFv1/ELFv2 abi.

Committed r265506

Apr 5 2016, 7:10 PM
cycheng committed rL265506: [ppc64] Enable sibling call optimization on ppc64 ELFv1/ELFv2 abi.
[ppc64] Enable sibling call optimization on ppc64 ELFv1/ELFv2 abi
Apr 5 2016, 7:10 PM
cycheng closed D17929: [Power9] Implement copy-paste, msgsync, slb, and stop instructions.

Committed r265504

Apr 5 2016, 6:56 PM
cycheng closed D17885: [Power9] Implement add-pc, multiply-add, modulo, extend-sign-shift, random number, set bool, and dfp test significance.

Committed r265505

Apr 5 2016, 6:55 PM
cycheng committed rL265505: [Power9] Implement add-pc, multiply-add, modulo, extend-sign-shift, random….
[Power9] Implement add-pc, multiply-add, modulo, extend-sign-shift, random…
Apr 5 2016, 6:52 PM
cycheng committed rL265504: [Power9] Implement copy-paste, msgsync, slb, and stop instructions.
[Power9] Implement copy-paste, msgsync, slb, and stop instructions
Apr 5 2016, 6:52 PM
cycheng committed rL265402: Add missing test for the "Don't delete empty preheaders" added in r265397.
Add missing test for the "Don't delete empty preheaders" added in r265397
Apr 5 2016, 7:27 AM
cycheng closed D16984: Don't delete empty preheaders in CodeGenPrepare if it would create a critical edge.

On behalf of Tom.
Committed revision: r265397

Apr 5 2016, 7:15 AM
cycheng committed rL265397: Don't delete empty preheaders in CodeGenPrepare if it would create a critical….
Don't delete empty preheaders in CodeGenPrepare if it would create a critical…
Apr 5 2016, 7:12 AM

Apr 1 2016

cycheng added inline comments to D16315: [ppc64] Enable sibling call optimization on ppc64 ELFv1/ELFv2 abi.
Apr 1 2016, 3:50 AM
cycheng updated the diff for D16315: [ppc64] Enable sibling call optimization on ppc64 ELFv1/ELFv2 abi.

Fixed issues mentioned by Kit

Apr 1 2016, 3:49 AM

Mar 31 2016

cycheng closed D17606: [PPC64] Bug fix: when enable both sibling-call-opt and shrink-wrapping, the tail call branch instruction might disappear.

Committed r265112

Mar 31 2016, 11:52 PM
cycheng committed rL265112: [PPC64] Bug fix: when enabling sibling-call-opt and shrink-wrapping, the tail….
[PPC64] Bug fix: when enabling sibling-call-opt and shrink-wrapping, the tail…
Mar 31 2016, 11:49 PM
cycheng updated the diff for D17885: [Power9] Implement add-pc, multiply-add, modulo, extend-sign-shift, random number, set bool, and dfp test significance.
  • Use ISA3.0 feature to guard new 3.0 instructions
  • combine XS_RS5_RA5_SH5 and XS_RS5_RA5_SH5o into one single multi-class
Mar 31 2016, 11:19 PM
cycheng closed D18448: Fix Sub-register Rewriting in Aggressive Anti-Dependence Breaker.

Committed r265097 (On behalf of Tom)

Mar 31 2016, 7:12 PM