Page MenuHomePhabricator

Carrot (Guozhi Wei)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 15 2015, 3:50 PM (363 w, 1 d)

Recent Activity

Tue, Jun 28

Carrot committed rG2fcc495549e1: [AArch64] Update test case. (authored by Carrot).
[AArch64] Update test case.
Tue, Jun 28, 6:41 PM · Restricted Project, Restricted Project
Carrot committed rGddc9e8861ccf: [MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce… (authored by Carrot).
[MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce…
Tue, Jun 28, 2:49 PM · Restricted Project, Restricted Project
Carrot closed D124564: [MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency.
Tue, Jun 28, 2:49 PM · Restricted Project, Restricted Project

Mon, Jun 27

Carrot updated the diff for D124564: [MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency.
Mon, Jun 27, 3:26 PM · Restricted Project, Restricted Project
Carrot added a comment to D124564: [MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency.

Oh yeah, I see. The way MachineCombiner's logic is held in target independent files for target patterns always trips me up. To check - this doesn't need the changes from D125588 now? It works stand-alone?

This optimization doesn't depend on D125588, it was the test case. In the original test case there are several operands come from COPY directly, and wrong latency is computed for them. In current version I added several instructions so no operands of SUB/ADD come from COPY instructions.

Mon, Jun 27, 3:25 PM · Restricted Project, Restricted Project

Fri, Jun 24

Carrot added a comment to D124564: [MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency.

If we add A - (B + C) are there any other patterns that would be similarly useful? This seems similar to the existing reassociate logic in the machine combiner, but AArch64InstrInfo::isAssociativeAndCommutative is not very thorough under AArch64. Would the new pattern be best to be marked as CombinerObjective::MustReduceDepth similar to the existing REASSOC patterns?

I think there are more similar useful patterns, but I encountered this pattern only. We can add other patterns when we have tests.

Fri, Jun 24, 3:34 PM · Restricted Project, Restricted Project
Carrot updated the diff for D124564: [MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency.

Add mir tests.

Fri, Jun 24, 3:27 PM · Restricted Project, Restricted Project

Wed, Jun 22

Carrot updated the diff for D124564: [MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency.

The improvement of cost model has been split out. Update this patch to contain the new pattern only.

Wed, Jun 22, 3:31 PM · Restricted Project, Restricted Project
Carrot added inline comments to D118301: [Spill2Reg][4/9] Added x86 profitability model..
Wed, Jun 22, 12:08 PM · Restricted Project, Restricted Project

Tue, Jun 21

Carrot added a comment to D125588: [MachineCombiner] Improve MachineCombiner's cost model.

It sounds to me like the MADD is decoded into two uops, mul and add. Then the mul can be immediately executable once the two multipliers are available. Is this correct?

Tue, Jun 21, 6:58 PM · Restricted Project, Restricted Project

Fri, Jun 17

Carrot added inline comments to D118300: [Spill2Reg][3/9] Code generation part 1..
Fri, Jun 17, 11:36 AM · Restricted Project, Restricted Project

Mon, Jun 13

Carrot added a comment to D125588: [MachineCombiner] Improve MachineCombiner's cost model.

ping

Mon, Jun 13, 9:42 AM · Restricted Project, Restricted Project

Mon, Jun 6

Carrot added a comment to D125588: [MachineCombiner] Improve MachineCombiner's cost model.

The test was run on a neoverse n1 machine. I didn't specify any -mcpu, so I guess it is -mcpu=generic.

Mon, Jun 6, 11:48 AM · Restricted Project, Restricted Project

May 31 2022

Carrot added a comment to D125588: [MachineCombiner] Improve MachineCombiner's cost model.

I tried SPEC2006 on a AArch64 machine. But for some unknown problem I couldn't run 464.h264ref correctly, even gcc compiled code generates same output.

May 31 2022, 7:25 PM · Restricted Project, Restricted Project

May 23 2022

Carrot added a comment to D125588: [MachineCombiner] Improve MachineCombiner's cost model.

I tested SPEC2006 int on my skylake desktop.

May 23 2022, 11:43 AM · Restricted Project, Restricted Project

May 16 2022

Herald added a project to D106408: Allow rematerialization of virtual reg uses: Restricted Project.

@rampitec, any follow up on this patch?

May 16 2022, 11:24 AM · Restricted Project, Restricted Project

May 13 2022

Carrot requested review of D125588: [MachineCombiner] Improve MachineCombiner's cost model.
May 13 2022, 3:40 PM · Restricted Project, Restricted Project

May 12 2022

Carrot added a comment to D124564: [MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency.

Hello. This looks like two different patches.

I did the MachineCombiner cost model improvement because it impacts my aarch64 test cases. But as you said it's reasonable to sent it as a separate patch. I will do that.

May 12 2022, 5:50 PM · Restricted Project, Restricted Project
Carrot added a comment to D119916: Add a machine function pass to convert binop(phi(constants), v) to phi(binop) .

Adding more potential machine pass reviewers.
Was the question from @nikic (about making this an IR pass) answered?

May 12 2022, 5:46 PM · Restricted Project, Restricted Project

May 11 2022

Carrot added reviewers for D124564: [MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency: fhahn, snnw.
May 11 2022, 2:04 PM · Restricted Project, Restricted Project

May 10 2022

Carrot added a comment to D119916: Add a machine function pass to convert binop(phi(constants), v) to phi(binop) .

Any comments from other reviewers?

May 10 2022, 5:03 PM · Restricted Project, Restricted Project

May 6 2022

Carrot added a comment to D124564: [MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency.

ping

May 6 2022, 7:17 PM · Restricted Project, Restricted Project

May 2 2022

Carrot added a comment to D119965: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate.

https://github.com/llvm/llvm-project/issues/55237 is filed.
We can move discussion to there.

May 2 2022, 3:26 PM · Restricted Project, Restricted Project

Apr 28 2022

Carrot added a comment to D119965: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate.

I added another GVNHoist pass after loop optimizations, then the two "or" instructions are replaced by one "or" instruction and hoisted into PreHeader2, and finally instructions in LoopExit2 got vectorized by SLPVectorizer.

Apr 28 2022, 6:50 PM · Restricted Project, Restricted Project

Apr 27 2022

Carrot requested review of D124564: [MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency.
Apr 27 2022, 3:29 PM · Restricted Project, Restricted Project

Apr 26 2022

Carrot added a comment to D119965: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate.

If I specify -rotation-max-header-size=2, loop rotation can be disabled for Loop2, and the second LICM hoists the "or" instruction to PreHeader2, which dominates following "or" instruction. And later GVN and SLPVectorizer works as previously, I got vectorized instructions for LoopBody2.

Apr 26 2022, 7:15 PM · Restricted Project, Restricted Project

Apr 14 2022

Carrot added inline comments to D123748: [ValueTracking] Added support to deduce PHI Nodes values being a power of 2.
Apr 14 2022, 6:18 PM · Restricted Project, Restricted Project
Carrot added a comment to D123748: [ValueTracking] Added support to deduce PHI Nodes values being a power of 2.
Apr 14 2022, 6:17 PM · Restricted Project, Restricted Project

Apr 13 2022

Herald added a project to D118299: [Spill2Reg][2/9] This patch adds spill/reload collection.: Restricted Project.
Apr 13 2022, 12:40 PM · Restricted Project, Restricted Project

Apr 12 2022

Carrot added a comment to D119965: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate.

Now I understand how this patch caused missing vectorization in our code. In my previous comment I have analyzed that different GVN result caused different SLPVectorization behavior. This time let's focus on how this patch generates different GVN results.

Apr 12 2022, 9:47 PM · Restricted Project, Restricted Project

Apr 4 2022

Carrot added a comment to D119965: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate.

I'm thinking of the following change, so we can remove PHI instruction without introduce any extra instructions on any path.

BB1:
   br %cond, label %BB2, label %BB3

BB2:
   ...
  %161 = or i64 %24, 1
  ...
   br label BBX

BB3:
   ...
  %179 = or i64 %24, 1
  ...
   br label BBX

BBX:
   // our interesting bb
  %245 = phi i64 [ %161, %BB2 ], [ %179, %BB3 ]
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...

==>

BB1:
   ...
   // New instruction is inserted here
   %245 = or i64 %24, 1
   br %cond, label %BB2, label %BB3

BB2:
   ...
  ...
   br label BBX

BB3:
   ...
  // Use of %245.
  ...
   br label BBX

BBX:
  // our interesting bb
  // PHI instruction is deleted.
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...

That kinda sounds like something for GVNHoist, which i think is currently still disabled due to some miscompilations?

Apr 4 2022, 2:24 PM · Restricted Project, Restricted Project
Carrot added a comment to D119965: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate.

There are two ways to view this;

  1. if all of the IV's of PHI are fully identical instructions with fully identical operands, then we don't need to PHI together the operands, and can replace the PHI with said instruction.
  2. The one-user check there is there to ensure that the instruction count does not increase, so in principle, if we need to PHI together the operands, we need as many of the instructions to be one-user as many PHI's we need.
Apr 4 2022, 8:29 AM · Restricted Project, Restricted Project

Apr 1 2022

Carrot added a comment to D119965: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate.

If this is really as simple as all the incoming values being identical instructions with identical arguments, then it seems like a simple extension of the InstCombinerImpl::foldPHIArgOpIntoPHI()

Apr 1 2022, 5:04 PM · Restricted Project, Restricted Project
Carrot added a comment to D119965: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate.

I still failed to reproduce it in plain mode.

Apr 1 2022, 4:22 PM · Restricted Project, Restricted Project

Mar 25 2022

Carrot added a comment to D119965: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate.

Could you also share the input source file to reproduce the difference? I think this will be needed to investigate the difference.

Mar 25 2022, 5:42 PM · Restricted Project, Restricted Project
Carrot added a comment to D119965: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate.

Before this patch I have a code snippet

     │ 80:   mov      -0x58(%rsp),%rdx                                                                                                                                                      
     │       mov      -0x60(%rsp),%r9                                                                                                                                                       
     │       mov      -0x40(%rsp),%r15                                                                                                                                                      
0.63 │ 8f:   mulps    %xmm8,%xmm9                                                                                                                                                           
0.44 │       movups   (%rdx,%r9,4),%xmm1                                                                                                                                                    
0.36 │       addps    %xmm9,%xmm1                                                                                                                                                           
0.25 │       movups   %xmm1,(%rdx,%r9,4)                                                                                                                                                    
0.75 │       mulps    %xmm8,%xmm3                                                                                                                                                           
0.41 │       movups   (%rdx,%r10,4),%xmm1                                                                                                                                                   
0.86 │       addps    %xmm3,%xmm1                                                                                                                                                           
0.70 │       movups   %xmm1,(%rdx,%r10,4)                                                                                                                                                   
0.43 │       add      $0x8,%r9                                                                                                                                                              
0.28 │       add      -0x48(%rsp),%rbx                                                                                                                                                      
0.07 │       cmp      %r15,%r9                                                                                                                                                              
0.14 │     ↓ jge      3e5                                             
             ...

After this patch, it is changed to

0.33 │ 80:   mov      %r12,%rdx                                                                                                                                                             
 0.32 │       or       $0x1,%rdx                                                                                                                                                             
 0.32 │       mov      %r12,%rdi                                                                                                                                                             
 0.39 │       or       $0x2,%rdi                                                                                                                                                             
 0.38 │       mov      %r12,%rcx                                                                                                                                                             
 0.30 │       or       $0x3,%rcx                                                                                                                                                             
 0.37 │       mov      %r12,%rbp                                                                                                                                                             
 0.39 │       or       $0x4,%rbp                                                                                                                                                             
 0.31 │       mov      %r12,%r10                                                                                                                                                             
 0.27 │       or       $0x5,%r10                                                                                                                                                             
 0.31 │       mov      %r12,%r9                                                                                                                                                              
 0.37 │       or       $0x6,%r9                                                                                                                                                              
 0.29 │       mov      %r12,%r8                                                                                                                                                              
 0.35 │       or       $0x7,%r8                                                                                                                                                              
 0.34 │ b1:   mulss    %xmm8,%xmm13                                                                                                                                                          
 0.39 │       addss    (%r11,%r12,4),%xmm13                                                                                                                                                  
 0.33 │       movss    %xmm13,(%r11,%r12,4)                                                                                                                                                  
 0.33 │       mulss    %xmm8,%xmm12                                                                                                                                                          
 0.41 │       addss    (%r11,%rdx,4),%xmm12                                                                                                                                                  
 0.38 │       movss    %xmm12,(%r11,%rdx,4)                                                                                                                                                  
 0.31 │       mulss    %xmm8,%xmm3                                                                                                                                                           
 0.39 │       addss    (%r11,%rdi,4),%xmm3                                                                                                                                                   
 0.35 │       movss    %xmm3,(%r11,%rdi,4)                                                                                                                                                   
 0.41 │       mulss    %xmm8,%xmm4                                                                                                                                                           
 0.31 │       addss    (%r11,%rcx,4),%xmm4                                                                                                                                                   
 0.31 │       movss    %xmm4,(%r11,%rcx,4)                                                                                                                                                   
 0.34 │       mulss    %xmm8,%xmm5                                                                                                                                                           
 0.41 │       addss    (%r11,%rbp,4),%xmm5                                                                                                                                                   
 0.34 │       movss    %xmm5,(%r11,%rbp,4)                                                                                                                                                   
 0.32 │       mulss    %xmm8,%xmm6                                                                                                                                                           
 0.34 │       addss    (%r11,%r10,4),%xmm6                                                                                                                                                   
 0.38 │       movss    %xmm6,(%r11,%r10,4)                                                                                                                                                   
 0.35 │       mulss    %xmm8,%xmm7                                                                                                                                                           
 0.43 │       addss    (%r11,%r9,4),%xmm7                                                                                                                                                    
 0.38 │       movss    %xmm7,(%r11,%r9,4)                                                                                                                                                    
 0.41 │       mulss    %xmm8,%xmm1                                                                                                                                                           
 0.36 │       addss    (%r11,%r8,4),%xmm1                                                                                                                                                    
 0.32 │       movss    %xmm1,(%r11,%r8,4)                                                                                                                                                    
 0.39 │       add      $0x8,%r12                                                                                                                                                             
 0.39 │       add      -0x18(%rsp),%rbx                                                                                                                                                      
 0.02 │       cmp      -0x60(%rsp),%r12                                                                                                                                                      
 0.31 │     ↓ jge      510                                                                    
              ...
Mar 25 2022, 12:18 PM · Restricted Project, Restricted Project
Herald added a project to D119965: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate: Restricted Project.

@wsmoses, this patch caused eigen regression in our code, it seems some originally vectorized code changed to scalarized code.
Could you revert it?

Mar 25 2022, 11:14 AM · Restricted Project, Restricted Project

Mar 22 2022

Carrot added a comment to D119916: Add a machine function pass to convert binop(phi(constants), v) to phi(binop) .

ping

Mar 22 2022, 4:12 PM · Restricted Project, Restricted Project

Mar 14 2022

Carrot added a comment to D119916: Add a machine function pass to convert binop(phi(constants), v) to phi(binop) .

ping

Mar 14 2022, 8:07 PM · Restricted Project, Restricted Project

Mar 7 2022

Herald added a project to D119916: Add a machine function pass to convert binop(phi(constants), v) to phi(binop) : Restricted Project.

ping

Mar 7 2022, 8:32 PM · Restricted Project, Restricted Project

Feb 24 2022

Carrot updated the diff for D119916: Add a machine function pass to convert binop(phi(constants), v) to phi(binop) .

Reformat.

Feb 24 2022, 7:09 PM · Restricted Project, Restricted Project

Feb 16 2022

Carrot added a comment to D119916: Add a machine function pass to convert binop(phi(constants), v) to phi(binop) .

My first question here would be whether it is preferable to do this as a machine-pass, or as a backend IR pass (which runs post-LSR at least, possibly part of CGP). Do you have any thoughts about what the trade-offs between those two options would be?

Feb 16 2022, 1:10 PM · Restricted Project, Restricted Project
Carrot added a comment to D119916: Add a machine function pass to convert binop(phi(constants), v) to phi(binop) .

I suppose this is the right place for this transformation, rather than in middle-end.
Is this the same transform as D117110? Some enhancements are missing there, but do we need both?

Feb 16 2022, 12:51 PM · Restricted Project, Restricted Project

Feb 15 2022

Carrot requested review of D119916: Add a machine function pass to convert binop(phi(constants), v) to phi(binop) .
Feb 15 2022, 10:55 PM · Restricted Project, Restricted Project

Feb 4 2022

Carrot accepted D118846: Clean up a test case..
Feb 4 2022, 12:33 PM · Restricted Project

Dec 29 2021

Carrot added a comment to D116058: [InstCombine] Convert binop(phi, v) to phi(binop) for constant phi operands.

ping

Dec 29 2021, 6:12 PM · Restricted Project

Dec 21 2021

Carrot added a comment to D116058: [InstCombine] Convert binop(phi, v) to phi(binop) for constant phi operands.

This reminds me of the SpeculateAroundPhis pass, see D37467 and D104099.

Dec 21 2021, 2:07 PM · Restricted Project
Carrot added a comment to D116058: [InstCombine] Convert binop(phi, v) to phi(binop) for constant phi operands.

immediate materialization instructions

There is no such thing in IR, so this sounds like it should be a back-end backtransform?

Dec 21 2021, 12:30 PM · Restricted Project

Dec 20 2021

Carrot requested review of D116058: [InstCombine] Convert binop(phi, v) to phi(binop) for constant phi operands.
Dec 20 2021, 3:28 PM · Restricted Project

Dec 9 2021

Carrot added a comment to D104099: [NewPM] Remove SpeculateAroundPHIs pass.

I'm trying to understand the problem in this pass.
It looks both of @thopre's and @lebedev.ri's problems are loop related. Is there any other regression not loop related?

Dec 9 2021, 4:20 PM · Restricted Project, Restricted Project

Dec 8 2021

Carrot added a comment to D114832: [SROA] Improve SROA to prevent generating redundant coalescing operations..

Driven by the compile-time feedback, another thing for me to consider, (probably after a preliminary convergence on the pass to add analysis info or do the transform), is to prune the existing solution so it's faster on the benchmark.

Dec 8 2021, 3:56 PM · Restricted Project

Dec 6 2021

Carrot added a comment to D104099: [NewPM] Remove SpeculateAroundPHIs pass.

Finally complain comes.

Dec 6 2021, 5:16 PM · Restricted Project, Restricted Project

Nov 29 2021

Carrot committed rGf1d8345a2ab3: [TwoAddressInstructionPass] Create register mapping for registers with multiple… (authored by Carrot).
[TwoAddressInstructionPass] Create register mapping for registers with multiple…
Nov 29 2021, 7:05 PM
Carrot closed D113193: [TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB.
Nov 29 2021, 7:05 PM · Restricted Project

Nov 22 2021

Carrot added a comment to D113193: [TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB.

ping

Nov 22 2021, 5:35 PM · Restricted Project

Nov 15 2021

Carrot added inline comments to D113193: [TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB.
Nov 15 2021, 12:04 PM · Restricted Project

Nov 12 2021

Carrot added a comment to D113193: [TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB.

ping

Nov 12 2021, 3:42 PM · Restricted Project

Nov 9 2021

Carrot updated the diff for D113193: [TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB.

Rebase.

Nov 9 2021, 3:25 PM · Restricted Project

Nov 4 2021

Carrot requested review of D113193: [TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB.
Nov 4 2021, 8:40 AM · Restricted Project

Oct 28 2021

Carrot committed rG1e46dcb77b51: [TwoAddressInstructionPass] Put all new instructions into DistanceMap (authored by Carrot).
[TwoAddressInstructionPass] Put all new instructions into DistanceMap
Oct 28 2021, 11:12 AM
Carrot closed D111857: 【TwoAddressInstructionPass】 Put all new instructions into DistanceMap.
Oct 28 2021, 11:12 AM · Restricted Project

Oct 26 2021

Carrot updated the diff for D111857: 【TwoAddressInstructionPass】 Put all new instructions into DistanceMap.

Thank pengfei's question, I did find another case that DistanceMap is not maintained correctly when unfolding memory operand.

Oct 26 2021, 5:51 PM · Restricted Project

Oct 25 2021

Carrot added inline comments to D111857: 【TwoAddressInstructionPass】 Put all new instructions into DistanceMap.
Oct 25 2021, 4:42 PM · Restricted Project

Oct 21 2021

Carrot added inline comments to D111857: 【TwoAddressInstructionPass】 Put all new instructions into DistanceMap.
Oct 21 2021, 8:41 AM · Restricted Project

Oct 20 2021

Carrot added inline comments to D111857: 【TwoAddressInstructionPass】 Put all new instructions into DistanceMap.
Oct 20 2021, 2:33 PM · Restricted Project

Oct 19 2021

Carrot updated the diff for D111857: 【TwoAddressInstructionPass】 Put all new instructions into DistanceMap.

Add a mir test.

Oct 19 2021, 5:43 PM · Restricted Project

Oct 18 2021

Carrot added a comment to D111857: 【TwoAddressInstructionPass】 Put all new instructions into DistanceMap.

I encountered this problem when I was working on a different optimization. It causes real problem in my work, but I can't find a test case for current code base.

Oct 18 2021, 8:28 AM · Restricted Project

Oct 14 2021

Carrot requested review of D111857: 【TwoAddressInstructionPass】 Put all new instructions into DistanceMap.
Oct 14 2021, 5:41 PM · Restricted Project

Oct 11 2021

Carrot committed rG6599961c1707: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation (authored by Carrot).
[TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation
Oct 11 2021, 3:32 PM
Carrot closed D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation.
Oct 11 2021, 3:32 PM · Restricted Project

Oct 8 2021

Carrot added a comment to D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation.

ping

Oct 8 2021, 9:56 AM · Restricted Project

Oct 1 2021

Carrot added a comment to D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation.

I ran spec2006 on a skylake desktop, the result is 38.2 vs 38.3, so no difference.
I also checked the overall impact on impacted test case

Oct 1 2021, 12:00 AM · Restricted Project

Sep 29 2021

Carrot updated the diff for D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation.

rebase.
Any other comments?

Sep 29 2021, 5:09 PM · Restricted Project

Sep 27 2021

Carrot added inline comments to D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation.
Sep 27 2021, 7:32 PM · Restricted Project

Sep 23 2021

Carrot added inline comments to D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation.
Sep 23 2021, 5:17 PM · Restricted Project

Sep 22 2021

Carrot added inline comments to D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation.
Sep 22 2021, 5:51 PM · Restricted Project
Carrot updated the diff for D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation.
Sep 22 2021, 5:51 PM · Restricted Project

Sep 21 2021

Carrot added inline comments to D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation.
Sep 21 2021, 2:33 PM · Restricted Project

Sep 20 2021

Carrot added inline comments to D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation.
Sep 20 2021, 11:29 AM · Restricted Project
Carrot updated the diff for D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation.
Sep 20 2021, 11:29 AM · Restricted Project

Sep 17 2021

Carrot updated the diff for D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation.

Rebase.

Sep 17 2021, 11:47 AM · Restricted Project

Sep 10 2021

Carrot added a comment to D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation.

ping

Sep 10 2021, 7:59 AM · Restricted Project

Sep 3 2021

Carrot added inline comments to D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation.
Sep 3 2021, 6:30 PM · Restricted Project

Sep 2 2021

Carrot added a comment to D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation.

Description makes it sound there are several changes here - do they stand on their own, or must they all happen all at once?

Sep 2 2021, 11:01 AM · Restricted Project

Sep 1 2021

Carrot added a comment to D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation.

ping

Sep 1 2021, 12:45 PM · Restricted Project

Aug 25 2021

Carrot requested review of D108731: [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation.
Aug 25 2021, 2:40 PM · Restricted Project

Jul 28 2021

Carrot committed rG50b62731452c: [MBP] findBestLoopTopHelper should exit if OldTop is not a chain header (authored by Carrot).
[MBP] findBestLoopTopHelper should exit if OldTop is not a chain header
Jul 28 2021, 7:03 PM
Carrot closed D106329: [MBP] findBestLoopTopHelper should exit if OldTop is not a chain header.
Jul 28 2021, 7:03 PM · Restricted Project

Jul 19 2021

Carrot requested review of D106329: [MBP] findBestLoopTopHelper should exit if OldTop is not a chain header.
Jul 19 2021, 6:04 PM · Restricted Project

Jul 16 2021

Carrot committed rG5609c8b60730: [X86FixupLEAs] Try again to transform the sequence LEA/SUB to SUB/SUB (authored by Carrot).
[X86FixupLEAs] Try again to transform the sequence LEA/SUB to SUB/SUB
Jul 16 2021, 10:19 AM
Carrot closed D104684: [X86FixupLEAs] Try again to transform the sequence LEA/SUB to SUB/SUB.
Jul 16 2021, 10:19 AM · Restricted Project

Jul 15 2021

Carrot added a comment to D104684: [X86FixupLEAs] Try again to transform the sequence LEA/SUB to SUB/SUB.

I successfully bootstrapped a stage2 clang, also I tested stage2-check-all without regression.

Jul 15 2021, 12:00 AM · Restricted Project

Jul 8 2021

Carrot added a comment to D104684: [X86FixupLEAs] Try again to transform the sequence LEA/SUB to SUB/SUB.

ping

Jul 8 2021, 8:06 PM · Restricted Project

Jul 1 2021

Carrot added a comment to D104684: [X86FixupLEAs] Try again to transform the sequence LEA/SUB to SUB/SUB.

ping.

Jul 1 2021, 5:19 PM · Restricted Project
Carrot added a comment to D105275: [SLP]Fix gathering of the scalars by not ignoring UndefValues..

Thanks for the fix!
The changes to CreateInsertElement looks good to me. But I'm not confident to review other parts of the patch.

Jul 1 2021, 9:30 AM · Restricted Project

Jun 30 2021

Carrot added inline comments to D103458: [SLP]Improve gathering of scalar elements..
Jun 30 2021, 3:17 PM · Restricted Project
Carrot added a comment to D103458: [SLP]Improve gathering of scalar elements..

Alive does not agree with your analysis. Also, it is not SLP who merges undefs into poison, Builder.CreateInsertElement does this magic. Also, https://llvm.org/docs/LangRef.html#poison-values says that phis do not depend on the operands, so phi is not poisoned.

Thanks for the pointer, from the LLVM IR dump, the poison value was generated after SLP pass, we may dig it further.

Jun 30 2021, 1:03 PM · Restricted Project
Carrot added a comment to D103458: [SLP]Improve gathering of scalar elements..

The problem is exactly in the merging of two undef into poison.

Jun 30 2021, 11:03 AM · Restricted Project

Jun 29 2021

Carrot added a comment to D103458: [SLP]Improve gathering of scalar elements..

Compile the following test case with

Jun 29 2021, 9:21 PM · Restricted Project

Jun 21 2021

Carrot requested review of D104684: [X86FixupLEAs] Try again to transform the sequence LEA/SUB to SUB/SUB.
Jun 21 2021, 8:52 PM · Restricted Project