This is an archive of the discontinued LLVM Phabricator instance.

[LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate
ClosedPublic

Authored by wsmoses on Feb 16 2022, 11:37 AM.

Details

Summary

LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information.

This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information.

Diff Detail

Event Timeline

wsmoses created this revision.Feb 16 2022, 11:37 AM
wsmoses requested review of this revision.Feb 16 2022, 11:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2022, 11:37 AM
wsmoses added a subscriber: vchuravy.

I'm currently looking into a complete revert, hang on...

Ok, that doesn't work.

Please rebase this patch, and update affected tests (no new tests are needed here):

  • the subject is missing "in" before "LICM"
  • The parameter name should be AllowSpeculation, it should be bool.
llvm/lib/Passes/PassBuilderPipelines.cpp
296–299

Please update the comment

wsmoses updated this revision to Diff 409690.Feb 17 2022, 9:20 AM

Address comments and rebase

wsmoses retitled this revision from [LICM][PhaseOrder] Don't speculate LICM until after running loop rotate to [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate.Feb 17 2022, 9:21 AM
lebedev.ri added inline comments.Feb 17 2022, 9:32 AM
llvm/include/llvm/Transforms/Scalar/LICM.h
49

bool

69

bool

llvm/include/llvm/Transforms/Utils/LoopUtils.h
174–175

This sounds worse than it is.

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
910

Could we please be consisted with /*AllowSpeculation=*///*AllowSpeculation*/

llvm/lib/Transforms/Scalar/LICM.cpp
206

Elsewhere LicmAllowSpeculation doesn't have a default value.
Should we be consistent with that?

wsmoses marked 5 inline comments as done.Feb 17 2022, 9:44 AM
wsmoses added inline comments.
llvm/lib/Transforms/Scalar/LICM.cpp
206

This constructor in particular needs a default value since it is constructed without arguments (and all other arguments have default values).

wsmoses updated this revision to Diff 409697.Feb 17 2022, 9:44 AM

Fix bool type, and comments

lebedev.ri added inline comments.Feb 17 2022, 9:51 AM
llvm/include/llvm/Transforms/Utils/LoopUtils.h
211

Same comment update (and elsewhere)

wsmoses updated this revision to Diff 409702.Feb 17 2022, 9:56 AM

Clarify speculative comment

wsmoses marked an inline comment as done.Feb 17 2022, 9:57 AM
lebedev.ri accepted this revision.Feb 17 2022, 10:03 AM

LGTM, thanks.

This revision is now accepted and ready to land.Feb 17 2022, 10:03 AM
wsmoses updated this revision to Diff 409717.Feb 17 2022, 10:41 AM

Set default LICM to speculate on new PM

wsmoses updated this revision to Diff 409736.Feb 17 2022, 11:35 AM

Update AArch tests

wsmoses updated this revision to Diff 409802.Feb 17 2022, 3:18 PM

Fix tests

This revision was landed with ongoing or failed builds.Feb 17 2022, 5:13 PM
This revision was automatically updated to reflect the committed changes.
Carrot added a subscriber: Carrot.Mar 25 2022, 11:14 AM

@wsmoses, this patch caused eigen regression in our code, it seems some originally vectorized code changed to scalarized code.
Could you revert it?

Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2022, 11:14 AM

What precisely is the regression and the origin of it? This patch essentially aims to restore the information which was lost during a prior Phase Ordering change in https://reviews.llvm.org/D99249 which caused a large number of performance regressions. What is the behavior of your code prior to https://reviews.llvm.org/D99249 landing?

@wsmoses, this patch caused eigen regression in our code, it seems some originally vectorized code changed to scalarized code.
Could you revert it?

This patch improved many regressions so it would be bad trade off to rush things and revert it.

It would be more productive approach if you could prepare small repro or at least some steps to reproduce your issue.

Before this patch I have a code snippet

     │ 80:   mov      -0x58(%rsp),%rdx                                                                                                                                                      
     │       mov      -0x60(%rsp),%r9                                                                                                                                                       
     │       mov      -0x40(%rsp),%r15                                                                                                                                                      
0.63 │ 8f:   mulps    %xmm8,%xmm9                                                                                                                                                           
0.44 │       movups   (%rdx,%r9,4),%xmm1                                                                                                                                                    
0.36 │       addps    %xmm9,%xmm1                                                                                                                                                           
0.25 │       movups   %xmm1,(%rdx,%r9,4)                                                                                                                                                    
0.75 │       mulps    %xmm8,%xmm3                                                                                                                                                           
0.41 │       movups   (%rdx,%r10,4),%xmm1                                                                                                                                                   
0.86 │       addps    %xmm3,%xmm1                                                                                                                                                           
0.70 │       movups   %xmm1,(%rdx,%r10,4)                                                                                                                                                   
0.43 │       add      $0x8,%r9                                                                                                                                                              
0.28 │       add      -0x48(%rsp),%rbx                                                                                                                                                      
0.07 │       cmp      %r15,%r9                                                                                                                                                              
0.14 │     ↓ jge      3e5                                             
             ...

After this patch, it is changed to

0.33 │ 80:   mov      %r12,%rdx                                                                                                                                                             
 0.32 │       or       $0x1,%rdx                                                                                                                                                             
 0.32 │       mov      %r12,%rdi                                                                                                                                                             
 0.39 │       or       $0x2,%rdi                                                                                                                                                             
 0.38 │       mov      %r12,%rcx                                                                                                                                                             
 0.30 │       or       $0x3,%rcx                                                                                                                                                             
 0.37 │       mov      %r12,%rbp                                                                                                                                                             
 0.39 │       or       $0x4,%rbp                                                                                                                                                             
 0.31 │       mov      %r12,%r10                                                                                                                                                             
 0.27 │       or       $0x5,%r10                                                                                                                                                             
 0.31 │       mov      %r12,%r9                                                                                                                                                              
 0.37 │       or       $0x6,%r9                                                                                                                                                              
 0.29 │       mov      %r12,%r8                                                                                                                                                              
 0.35 │       or       $0x7,%r8                                                                                                                                                              
 0.34 │ b1:   mulss    %xmm8,%xmm13                                                                                                                                                          
 0.39 │       addss    (%r11,%r12,4),%xmm13                                                                                                                                                  
 0.33 │       movss    %xmm13,(%r11,%r12,4)                                                                                                                                                  
 0.33 │       mulss    %xmm8,%xmm12                                                                                                                                                          
 0.41 │       addss    (%r11,%rdx,4),%xmm12                                                                                                                                                  
 0.38 │       movss    %xmm12,(%r11,%rdx,4)                                                                                                                                                  
 0.31 │       mulss    %xmm8,%xmm3                                                                                                                                                           
 0.39 │       addss    (%r11,%rdi,4),%xmm3                                                                                                                                                   
 0.35 │       movss    %xmm3,(%r11,%rdi,4)                                                                                                                                                   
 0.41 │       mulss    %xmm8,%xmm4                                                                                                                                                           
 0.31 │       addss    (%r11,%rcx,4),%xmm4                                                                                                                                                   
 0.31 │       movss    %xmm4,(%r11,%rcx,4)                                                                                                                                                   
 0.34 │       mulss    %xmm8,%xmm5                                                                                                                                                           
 0.41 │       addss    (%r11,%rbp,4),%xmm5                                                                                                                                                   
 0.34 │       movss    %xmm5,(%r11,%rbp,4)                                                                                                                                                   
 0.32 │       mulss    %xmm8,%xmm6                                                                                                                                                           
 0.34 │       addss    (%r11,%r10,4),%xmm6                                                                                                                                                   
 0.38 │       movss    %xmm6,(%r11,%r10,4)                                                                                                                                                   
 0.35 │       mulss    %xmm8,%xmm7                                                                                                                                                           
 0.43 │       addss    (%r11,%r9,4),%xmm7                                                                                                                                                    
 0.38 │       movss    %xmm7,(%r11,%r9,4)                                                                                                                                                    
 0.41 │       mulss    %xmm8,%xmm1                                                                                                                                                           
 0.36 │       addss    (%r11,%r8,4),%xmm1                                                                                                                                                    
 0.32 │       movss    %xmm1,(%r11,%r8,4)                                                                                                                                                    
 0.39 │       add      $0x8,%r12                                                                                                                                                             
 0.39 │       add      -0x18(%rsp),%rbx                                                                                                                                                      
 0.02 │       cmp      -0x60(%rsp),%r12                                                                                                                                                      
 0.31 │     ↓ jge      510                                                                    
              ...
fhahn added a comment.Mar 25 2022, 2:25 PM

Before this patch I have a code snippet

│ 80:   mov      -0x58(%rsp),%rdx                                                                                                                                                      
│       mov      -0x60(%rsp),%r9

Could you also share the input source file to reproduce the difference? I think this will be needed to investigate the difference.

Could you also share the input source file to reproduce the difference? I think this will be needed to investigate the difference.

The original build configuration is a combination of thinlto and fdo. I'm trying to get a separate reduced test case.

fhahn added a comment.Mar 28 2022, 8:54 AM

Could you also share the input source file to reproduce the difference? I think this will be needed to investigate the difference.

The original build configuration is a combination of thinlto and fdo. I'm trying to get a separate reduced test case.

That would be very helpful!

Carrot added a comment.Apr 1 2022, 4:22 PM

I still failed to reproduce it in plain mode.

But now I understand the problem more clear. It looks this patch triggered some inefficiency in following optimizations.

After the LICM pass, the two versions of IR differs significantly, but our interesting BB is the same.

%236 = fmul float %162, %6, !dbg !2160
%237 = mul nsw i64 %19, %5, !dbg !2161
%238 = getelementptr inbounds float, float* %4, i64 %237, !dbg !2162
%239 = load float, float* %238, align 4, !dbg !2163
%240 = fadd float %239, %236, !dbg !2163
store float %240, float* %238, align 4, !dbg !2163
%241 = fmul float %163, %6, !dbg !2164
%242 = or i64 %19, 1, !dbg !2165                                             // *
%243 = mul nsw i64 %242, %5, !dbg !2166
%244 = getelementptr inbounds float, float* %4, i64 %243, !dbg !2167
%245 = load float, float* %244, align 4, !dbg !2168
%246 = fadd float %245, %241, !dbg !2168
store float %246, float* %244, align 4, !dbg !2168
%247 = fmul float %164, %6, !dbg !2169
%248 = or i64 %19, 2, !dbg !2170                                             // *
%249 = mul nsw i64 %248, %5, !dbg !2171
%250 = getelementptr inbounds float, float* %4, i64 %249, !dbg !2172
%251 = load float, float* %250, align 4, !dbg !2173
%252 = fadd float %251, %247, !dbg !2173
store float %252, float* %250, align 4, !dbg !2173
%253 = fmul float %165, %6, !dbg !2174
%254 = or i64 %19, 3, !dbg !2175                                            // *
%255 = mul nsw i64 %254, %5, !dbg !2176
%256 = getelementptr inbounds float, float* %4, i64 %255, !dbg !2177
...

Notice those or instructions, they are used together with following mul/GEP instructions to access consecutive array elements.

In old version IR, the loop header contains same group of or instructions, GVNPass found this fact, it deletes these or instructions in our interesting BB and reuse the results of those or instructions in loop header. Later SLPVectorize can still understand GEP instructions compute consecutive memory addresses, and vectorized this BB.

In the new version IR, the loop header doesn't contain those or instructions, instead one of the predecessors of this BB contains these or instructions, they look like

BB1:
   br %cond, label %BB2, label %BB3

BB2:
   ...
   br label BBX

BB3:
   ...
  %179 = or i64 %24, 1
  ...
   br label BBX

BBX:
   // our interesting bb
  ...
  %242 = or i64 %24, 1, !dbg !2165
  ...

Then GVN insert or instructions to BB2, replaces the or instructions with PHIs in BBX.

BB1:
   br %cond, label %BB2, label %BB3

BB2:
   ...
  %161 = or i64 %24, 1
  ...
   br label BBX

BB3:
   ...
  %179 = or i64 %24, 1
  ...
   br label BBX

BBX:
   // our interesting bb
  %245 = phi i64 [ %161, %BB2 ], [ %179, %BB3 ]
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...

Although all PHI operands have same value, SLPVectorizer couldn't recognize it, so it can't figure out the GEPs computes consecutive memory addresses, and failed to vectorize this BB.

There are 3 potential solutions.

  • GVNPass, if a new PHI's operands have same value, we can move them to dominator, and delete the PHI and its operands.
  • InstCombinePass, do the same thing described above, but as a clean up work in a later pass.
  • SLPVectorizerPass, we can teach it look into PHI operands, PHI's operands may have same value, we can get more useful information from it.

Which method do you think is better?

If this is really as simple as all the incoming values being identical instructions with identical arguments, then it seems like a simple extension of the InstCombinerImpl::foldPHIArgOpIntoPHI()

Carrot added a comment.Apr 1 2022, 5:04 PM

If this is really as simple as all the incoming values being identical instructions with identical arguments, then it seems like a simple extension of the InstCombinerImpl::foldPHIArgOpIntoPHI()

foldPHIArgOpIntoPHI requires all operands are used by PHI only. In our case or instruction in BB3 has other users.

If this is really as simple as all the incoming values being identical instructions with identical arguments, then it seems like a simple extension of the InstCombinerImpl::foldPHIArgOpIntoPHI()

foldPHIArgOpIntoPHI requires all operands are used by PHI only. In our case or instruction in BB3 has other users.

There are two ways to view this;

  1. if all of the IV's of PHI are fully identical instructions with fully identical operands, then we don't need to PHI together the operands, and can replace the PHI with said instruction.
  2. The one-user check there is there to ensure that the instruction count does not increase, so in principle, if we need to PHI together the operands, we need as many of the instructions to be one-user as many PHI's we need.
Carrot added a comment.Apr 4 2022, 8:29 AM

There are two ways to view this;

  1. if all of the IV's of PHI are fully identical instructions with fully identical operands, then we don't need to PHI together the operands, and can replace the PHI with said instruction.
  2. The one-user check there is there to ensure that the instruction count does not increase, so in principle, if we need to PHI together the operands, we need as many of the instructions to be one-user as many PHI's we need.

I'm thinking of the following change, so we can remove PHI instruction without introduce any extra instructions on any path.

BB1:
   br %cond, label %BB2, label %BB3

BB2:
   ...
  %161 = or i64 %24, 1
  ...
   br label BBX

BB3:
   ...
  %179 = or i64 %24, 1
  ...
   br label BBX

BBX:
   // our interesting bb
  %245 = phi i64 [ %161, %BB2 ], [ %179, %BB3 ]
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...

==>

BB1:
   ...
   // New instruction is inserted here
   %245 = or i64 %24, 1
   br %cond, label %BB2, label %BB3

BB2:
   ...
  ...
   br label BBX

BB3:
   ...
  // Use of %245.
  ...
   br label BBX

BBX:
  // our interesting bb
  // PHI instruction is deleted.
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...

There are two ways to view this;

  1. if all of the IV's of PHI are fully identical instructions with fully identical operands, then we don't need to PHI together the operands, and can replace the PHI with said instruction.
  2. The one-user check there is there to ensure that the instruction count does not increase, so in principle, if we need to PHI together the operands, we need as many of the instructions to be one-user as many PHI's we need.

I'm thinking of the following change, so we can remove PHI instruction without introduce any extra instructions on any path.

BB1:
   br %cond, label %BB2, label %BB3

BB2:
   ...
  %161 = or i64 %24, 1
  ...
   br label BBX

BB3:
   ...
  %179 = or i64 %24, 1
  ...
   br label BBX

BBX:
   // our interesting bb
  %245 = phi i64 [ %161, %BB2 ], [ %179, %BB3 ]
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...

==>

BB1:
   ...
   // New instruction is inserted here
   %245 = or i64 %24, 1
   br %cond, label %BB2, label %BB3

BB2:
   ...
  ...
   br label BBX

BB3:
   ...
  // Use of %245.
  ...
   br label BBX

BBX:
  // our interesting bb
  // PHI instruction is deleted.
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...

That kinda sounds like something for GVNHoist, which i think is currently still disabled due to some miscompilations?

That kinda sounds like something for GVNHoist, which i think is currently still disabled due to some miscompilations?

I believe that known miscompilations are fixed but major perf regressions were found so.. disabled.

Carrot added a comment.Apr 4 2022, 2:24 PM

I'm thinking of the following change, so we can remove PHI instruction without introduce any extra instructions on any path.

BB1:
   br %cond, label %BB2, label %BB3

BB2:
   ...
  %161 = or i64 %24, 1
  ...
   br label BBX

BB3:
   ...
  %179 = or i64 %24, 1
  ...
   br label BBX

BBX:
   // our interesting bb
  %245 = phi i64 [ %161, %BB2 ], [ %179, %BB3 ]
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...

==>

BB1:
   ...
   // New instruction is inserted here
   %245 = or i64 %24, 1
   br %cond, label %BB2, label %BB3

BB2:
   ...
  ...
   br label BBX

BB3:
   ...
  // Use of %245.
  ...
   br label BBX

BBX:
  // our interesting bb
  // PHI instruction is deleted.
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...

That kinda sounds like something for GVNHoist, which i think is currently still disabled due to some miscompilations?

I tried GVNHoist, it works as expected, moves the "or" instruction to BB1.

But BB1 is in a loop, later LICM moves the "or" instruction to the loop preheader. Unfortunately another identical "or" instruction exists in one of its predecessors, I encountered the same problem as before GVNHoist, but this time no GVNHoist can hoist the "or" instruction, and later GVNPass creates a PHI for it.

Can I add another GVNHoist pass after loop optimizations?

Carrot added a comment.EditedApr 12 2022, 9:47 PM

Now I understand how this patch caused missing vectorization in our code. In my previous comment I have analyzed that different GVN result caused different SLPVectorization behavior. This time let's focus on how this patch generates different GVN results.

Following is simplified IR before the first LICM. The original code is much more complex, and multiple optimizations are involved, so only related instructions and control flow are listed.

LoopHeader1:
  br %cond1, label %PreHeader2, %LoopExit1

PreHeader2:
  br label %LoopHeader2

LoopHeader2:
  br %cond2, label %LoopBody2, label %LoopExit2

LoopBody2:
  %100 = or i64 %24, 1
  ... // uses of %100
  br label %LoopHeader2

LoopExit2:
  %200 = or i64 %24, 1
  ... // uses of %200
  br label %LoopHeader1

Without this patch:

  • First LICM of loop2, the definition of %100 is moved to PreHeader2. So now the definition of %100 dominates %200.
  • LoopRotate of loop1, BB LoopHeader1 is duplicated into predecessors and deleted, PreHeader2 becomes the new loop header of loop1, so now it's more obvious that %100 dominates %200.
  • GVN, because the definition of %100 dominates %200, %200 is deleted, all uses of %200 are replaced by %100.

With this patch, we have following different behavior:

  • First LICM of loop2, speculation is disabled, the definition of %100 is not moved, so code is not changed.
  • LoopRotate of loop2, LoopHeader2 is duplicated into predecessors and deleted. Pay attention that there is a new pre header for loop2 is created. Now we have following code
LoopHeader1:
  br %cond1, label %PreHeader2, %LoopExit1

PreHeader2:
  br %cond2, label %NewPreHeader2, label %LoopExit2

NewPreHeader2:
  br %LoopBody2

LoopBody2:
  %100 = or i64 %24, 1
  ... // uses of %100
  br %cond2, label %LoopBody2, label %_crit_edge

_crit_edge
  br label %LoopExit2

LoopExit2:
  %200 = or i64 %24, 1
  ... // uses of %200
  br label %LoopHeader1
  • Second LICM of loop2, this time the definition of %100 is moved to NewPreHeader2, but this time it doesn't dominate %200.
LoopHeader1:
  br %cond1, label %PreHeader2, %LoopExit1

PreHeader2:
  br %cond2, label %NewPreHeader2, label %LoopExit2

NewPreHeader2:
  %100 = or i64 %24, 1
  br %LoopBody2

LoopBody2:
  ... // uses of %100
  br %cond2, label %LoopBody2, label %_crit_edge

_crit_edge
  br label %LoopExit2

LoopExit2:
  %200 = or i64 %24, 1
  ... // uses of %200
  br label %LoopHeader1
  • LoopRotate of loop1, LoopHeader1 is duplicated into predecessors and deleted, PreHeader2 becomes the new loop header of loop1. %100 still doesn't dominates %200
NewPreHeader1:
  br label %PreHeader2

PreHeader2:                        // It's actually loop header of loop1
  br %cond2, label %NewPreHeader2, label %LoopExit2

NewPreHeader2:
  %100 = or i64 %24, 1
  br %LoopBody2

LoopBody2:
  ... // uses of %100
  br %cond2, label %LoopBody2, label %_crit_edge

_crit_edge
  br label %LoopExit2

LoopExit2:
  %200 = or i64 %24, 1
  ... // uses of %200
  br %cond1, label %PreHeader2, label %LoopExit1
  • GVN, because the definition of %100 can reach %200 but doesn't dominate %200, so GVN adds a new definition in the other path PreHeader2 -> LoopExit2, it's a critical edge, so it's splitted. New definition of the "or" instruction is inserted in the new BB. A PHI instruction is inserted in LoopExit2.
NewPreHeader1:
  br label %PreHeader2

PreHeader2:                        // It's actually loop header of loop1
  br %cond2, label %NewPreHeader2, label %LoopExit2.crit_edge

NewPreHeader2:
  %100 = or i64 %24, 1
  br %LoopBody2

LoopBody2:
  ... // uses of %100
  br %cond2, label %LoopBody2, label %_crit_edge

_crit_edge:
  br label %LoopExit2

LoopExit2.crit_edge:
  %150 = or i64 %24, 1
  br label %LoopExit2
  
LoopExit2:
  %200 = phi i64 [%150, LoopExit2.crit_edge], [%100, _crit_edge]
  ... // uses of %200
  br %cond1, label %PreHeader2, label %LoopExit1

This is how we got different IR at the end of GVN. And later SLPVectorization makes different decision with these IR.

Any suggestions on how to fix it?

If I specify -rotation-max-header-size=2, loop rotation can be disabled for Loop2, and the second LICM hoists the "or" instruction to PreHeader2, which dominates following "or" instruction. And later GVN and SLPVectorizer works as previously, I got vectorized instructions for LoopBody2.

But the performance is not restored, I got more register spills. Loop rotation can change Loop2 to a single block loop, which benefits many other optimizations, so it is also important to our eigen code.

I added another GVNHoist pass after loop optimizations, then the two "or" instructions are replaced by one "or" instruction and hoisted into PreHeader2, and finally instructions in LoopExit2 got vectorized by SLPVectorizer.

Any comments on this method?

Since I feel like this recent conversation is distinct from the PR, I think it deserves its own issue for visibility and tracking.

@Carrot how would you feel about opening a distinct issue on Github, providing all of the relevant context, and tagging the relevant folks?

xbolva00 added a comment.EditedApr 29 2022, 1:12 AM

Last I saw, GVNHoist was disabled due to perf regressions. Then, new effort started to reimplement it - https://lists.llvm.org/pipermail/llvm-dev/2021-September/152665.html but somehow dead now.

So you may improve your specific loop with GVNHoist, but you will regress something else.

Carrot added a comment.May 2 2022, 3:26 PM

https://github.com/llvm/llvm-project/issues/55237 is filed.
We can move discussion to there.

Allen added a subscriber: Allen.May 3 2022, 12:32 AM