This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] common chains to reuse offsets to reduce register pressure
ClosedPublic

Authored by shchenz on Aug 25 2021, 11:06 PM.

Details

Summary
//    common multiple chains for the load/stores with same offsets in the loop, 
//    so that we can reuse the offsets and reduce the register pressure in the                                                                         
//    loop. This transformation can also increase the loop ILP as now each chain
//    uses its own loop induction add/addi. But this will increase the number of
//    add/addi in the loop.                                                     
//
//    char *p;                                                                  
//    A1 = p + base1                                                                                                                                   
//    A2 = p + base1 + offset                                                   
//    B1 = p + base2                                                            
//    B2 = p + base2 + offset                                                   
//                                                                              
//    for (int i = 0; i < n; i++)                                               
//      unsigned long x1 = *(unsigned long *)(A1 + i);                          
//      unsigned long x2 = *(unsigned long *)(A2 + i)                           
//      unsigned long x3 = *(unsigned long *)(B1 + i);                          
//      unsigned long x4 = *(unsigned long *)(B2 + i);                          
//    }                                                                         
//                                                                              
//    to look like this:                                                        
//                                                                                  
//    A1_new = p + base1 // chain 1                                                 
//    B1_new = p + base2 // chain 2, now inside the loop, common offset is          
//                       // reused.                                                 
//                                                                                  
//    for (long long i = 0; i < n; i+=count) {                                      
//      unsigned long x1 = *(unsigned long *)(A1_new + i);                          
//      unsigned long x2 = *(unsigned long *)((A1_new + i) + offset);               
//      unsigned long x3 = *(unsigned long *)(B1_new + i);                          
//      unsigned long x4 = *(unsigned long *)((B1_new + i) + offset);               
//    }

Found some improvements for our internal benchmarks.

Diff Detail

Event Timeline

shchenz created this revision.Aug 25 2021, 11:06 PM
shchenz requested review of this revision.Aug 25 2021, 11:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2021, 11:06 PM
qiucf added a subscriber: qiucf.Aug 26 2021, 12:18 AM
qiucf added inline comments.
llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
128

Better as EnableChainCommoning?

shchenz updated this revision to Diff 370162.Sep 1 2021, 10:15 PM
shchenz marked an inline comment as done.

address @qiucf comments and rebase

shchenz added inline comments.Sep 1 2021, 10:17 PM
llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
128

Good idea.

shchenz planned changes to this revision.Sep 22 2021, 2:22 AM

For a better review, I will split the patch into two patches:
1: NFC patch for the function outlining, like rewriteForBase and rewriteForBucketElement;
2: functionality change for chain common

jsji added inline comments.Oct 20 2021, 2:13 PM
llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
66

Can we avoid using ' in examples?

431

treat it with? treat it as?

506

Why this can not reuse collectCandidates?

512

We already collected both Dform and DQform buckets before, can we reuse some info to do early exit?

540

TotalCount? ReusableOffsetCount?

541

FirstGroupCount? NewGroupCount?

543

SawSeperator? SawDifferentOffset?

545

Do we have assumption about the order of Element offset? Can we add comments

555

How about 2 reusable offset in 20 offsets? Maybe we should introduce a heuristic to control whether we want to skip the bucket? eg: if the ratio of reusable is very low?

558

// Only one offset?

559

Why sqrt(EleNum)?

578

This is common to 552, can we just us different ChainNum w w/o SawSeperator?

582

Why we need this check here?

619

Can we introduce an option to control, and just set 4 to be the default?

Also, why we can't push this check into prepareBasesForCommoningChains?

627

Can we loop through BBChanged only?

644

How often we will get unsafe offsets? Can we just do early exit in the main loop below?

662

pistart? Can we use a more meaningful name.

shchenz updated this revision to Diff 381169.Oct 21 2021, 1:20 AM
shchenz marked 16 inline comments as done.

address @jsji comments

Thanks for your review. @jsji Updated accordingly.

llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
512

Good idea. We can share the info among all forms. I split this into a new patch D112196.

555

I added a new threshold MaxVarsChainCommon in the new patch which will prevent from commoning too many chains.

559

The minimal for (EleNum/x + x) (1 <= x <= EleNum and x is the chain number) should be when x == sqrt(EleNum)?

578

Maybe we can revisit this comment after I add more comments to this function to make it clear.

582

Suppose there are two chains, we need to make sure, each diff between two adjacent elements is the same among chains.
For example,

after we divided the bucket to two chains A {baseA, offset1, offset2, offset3, offset4}, B {baseB, offset5, offset6, offset7, offset8},
we need to make sure offset1 = offset5, offset2 == offset6, offset3 = offset7, offset4 = offset8.

So we can use two bases and 4 reused offsets.

627

yes, I think we can and we should. I will post another patch for this NFC change as this usage is also existing in updateFormPrep and dispFormPrep

644

I am not sure how often we get unsafe offset. Before we expand a SCEV, it is safe to call this function first. If necessary, I will do more investigation later to confirm for what kind of case, isSafeToExpandis needed

jsji added inline comments.Oct 21 2021, 8:32 AM
llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
67

offset is reused ? common offset is re-used?

71

Should we use ((A1_new + i) + offset ) to be clearer?

421

isChainCommoningCandidate , to be aligned with isDSFormCandidate etc.

487

Why we have this limit check here, but not in addOneCandidate?

Also we should at least emit some debug log if we are really discarding candidate due to bucket limit here.

490

This should be checked in collectCandiate, as isValidCandidate.

499

Can we also abstract this check so that we can also reuse addOneCandidate to do this.

Something like:

for (auto &B : Buckets) {
   const SCEV *Diff = SE->getMinusSCEV(LSCEV, B.BaseSCEV);
   if ( const auto *Diff = isValidDiff(Diff, LSCEV, B.BaseSCEV)) {
     B.Elements.push_back(BucketElement(Diff, MemI));
     FoundBucket = true;
     break;
   }
 }

then we use isValidConstantDiff for others, but isValidChainCommoningDiff for this.

isValidConstantDiff(Diff, LSCEV, B.BaseSCEV) ={
return dyn_cast<SCEVConstant>(Diff)
}

isValidChainCommoningDiff(Diff, LSCEV, B.BaseSCEV) ={

  assert(LSCEV && "Invalid SCEV for Ptr value.");

  // Don't mess up previous dform prepare.
  if (isa<SCEVConstant>(LSCEV))
    return null_ptr;

  // A single integer type offset.
  if (isa<SCEVUnknown>(LSCEV) && LSCEV->getType()->isIntegerTy())
    return Diff;

  const SCEVNAryExpr *ASCEV = dyn_cast<SCEVNAryExpr>(LSCEV);
  if (!ASCEV)
    return null_ptr;

  for (const SCEV *Op : ASCEV->operands())
    if (!Op->getType()->isIntegerTy())
      return null_ptr;

  return Diff;


}
512

D112196 looks good. But I think we can still do more for this? eg: If both Dform and DQform buckets are empty, then there is no point trying to run this commoning either?

584

How about this?

unsigned ChainNum=FirstOffsetReusedCount / FirstOffsetReusedCountInFirstChain;
if (!SawChainSeparater) 
  ChainNum = (unsigned)sqrt(EleNum);

 CBucket.ChainSize = (unsigned)(EleNum / ChainNum);

 // If this is not a perfect chain(eg: all elements can be put inside
 // commoning chains.), skip now.
 if (CBucket.ChainSize * ChainNum != EleNum)
     return false;

 if (SawChainSeparater) {
 // Check that the offset seqs are the same for all chains?
     for (unsigned i = 1; i < CBucket.ChainSize; i++)
         for (unsigned j = 1; j < ChainNum; j++)
            if (CBucket.Elements[i].Offset !=
                SE->getMinusSCEV(CBucket.Elements[i + j * CBucket.ChainSize].Offset,
                          CBucket.Elements[j * CBucket.ChainSize].Offset))
            return false;

  }

 for (unsigned i = 0; i < ChainNum; i++)
   CBucket.ChainBases.push_back(CBucket.Elements[i * CBucket.ChainSize]);
602

Not all elements?

606

// Check that the offset seqs are the same for all chains?

858

Since we are reusing collectCandidates, this check can be moved out to around line 1488.

882

Why we can't pass isValidChainCommoningCandidate as isValidCandate (yes, need to add one more LSCEV parameter to function prototype) , and pass`addOneCandidateForChainCommoning` as addOneCandiate function too?

Then we don't need the IsChainCommoning bool parameter and conditional check here.

jsji added inline comments.Oct 21 2021, 8:39 AM
llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
499
isValidChainCommoningDiff(Diff, LSCEV, B.BaseSCEV) ={


  assert(LSCEV && "Invalid SCEV for Ptr value.");

  if (cast<SCEVAddRecExpr>(B.BaseSCEV)->getStepRecurrence(*SE) !=
        cast<SCEVAddRecExpr>(LSCEV)->getStepRecurrence(*SE))
      return null_ptr;

  // Don't mess up previous dform prepare.
  if (isa<SCEVConstant>(LSCEV))
    return null_ptr;

  // A single integer type offset.
  if (isa<SCEVUnknown>(LSCEV) && LSCEV->getType()->isIntegerTy())
    return Diff;

  const SCEVNAryExpr *ASCEV = dyn_cast<SCEVNAryExpr>(LSCEV);
  if (!ASCEV)
    return null_ptr;

  for (const SCEV *Op : ASCEV->operands())
    if (!Op->getType()->isIntegerTy())
      return null_ptr;

  return Diff;


}
shchenz updated this revision to Diff 381492.Oct 22 2021, 2:40 AM
shchenz marked 14 inline comments as done.
shchenz edited the summary of this revision. (Show Details)

address @jsji comments

shchenz added inline comments.Oct 22 2021, 2:54 AM
llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
512

Not exactly, ds/dq forms require the diff between elements must be constants, but we can prepare for non-constant diffs for chain commoning.

jsji accepted this revision as: jsji.Oct 22 2021, 7:52 AM

LGTM. Thanks for working on this.

llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
154

Address clang-format warning please.

558

Can we name this different from line 1019? both are "loopprepare"?

830

This is a new check for updateform/d/dqform, make sure it is intended.

This revision is now accepted and ready to land.Oct 22 2021, 7:52 AM
This revision was landed with ongoing or failed builds.Oct 24 2021, 9:22 PM
This revision was automatically updated to reflect the committed changes.
shchenz marked 3 inline comments as done.

Thanks for your review. @jsji

llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
830

Yes, this is required for chain common but not needed for update/d/dqform rewrite.

ro added a subscriber: ro.Oct 29 2021, 2:01 AM

This patch broke the Solaris build:

/vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp: In member function ‘bool {anonymous}::PPCLoopInstrFormPrep::prepareBasesForCommoningChains({anonymous}::Bucket&)’:
/vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp:497:37: error: call of overloaded ‘sqrt(unsigned int&)’ is ambiguous
  497 |     ChainNum = (unsigned)sqrt(EleNum);
      |                                     ^
In file included from /vol/gcc-10/.11.4-gas-64/lib/gcc/amd64-pc-solaris2.11/10.1.0/include-fixed/math.h:24,
                 from /vol/gcc-10/.11.4-gas-64/include/c++/10.1.0/bits/std_abs.h:40,
                 from /vol/gcc-10/.11.4-gas-64/include/c++/10.1.0/cstdlib:77,
                 from /vol/gcc-10/.11.4-gas-64/include/c++/10.1.0/bits/stl_algo.h:59,
                 from /vol/gcc-10/.11.4-gas-64/include/c++/10.1.0/algorithm:62,
                 from /vol/llvm/src/llvm-project/dist/llvm/include/llvm/ADT/Hashing.h:51,
                 from /vol/llvm/src/llvm-project/dist/llvm/include/llvm/ADT/Optional.h:18,
                 from /vol/llvm/src/llvm-project/dist/llvm/include/llvm/ADT/STLExtras.h:19,
                 from /vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCFrameLowering.h:15,
                 from /vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCSubtarget.h:16,
                 from /vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp:79:
/vol/gcc-10/.11.4-gas-64/lib/gcc/amd64-pc-solaris2.11/10.1.0/include-fixed/iso/math_iso.h:220:21: note: candidate: ‘long double std::sqrt(long double)’
  220 |  inline long double sqrt(long double __X) { return __sqrtl(__X); }
      |                     ^~~~
/vol/gcc-10/.11.4-gas-64/lib/gcc/amd64-pc-solaris2.11/10.1.0/include-fixed/iso/math_iso.h:186:15: note: candidate: ‘float std::sqrt(float)’
  186 |  inline float sqrt(float __X) { return __sqrtf(__X); }
      |               ^~~~
/vol/gcc-10/.11.4-gas-64/lib/gcc/amd64-pc-solaris2.11/10.1.0/include-fixed/iso/math_iso.h:74:15: note: candidate: ‘double std::sqrt(double)’
   74 | extern double sqrt __P((double));
      |               ^~~~

Can be fixed e.g. by casting EleNum to double.

In D108750#3095793, @ro wrote:

This patch broke the Solaris build:

/vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp: In member function ‘bool {anonymous}::PPCLoopInstrFormPrep::prepareBasesForCommoningChains({anonymous}::Bucket&)’:
/vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp:497:37: error: call of overloaded ‘sqrt(unsigned int&)’ is ambiguous
  497 |     ChainNum = (unsigned)sqrt(EleNum);
      |                                     ^
In file included from /vol/gcc-10/.11.4-gas-64/lib/gcc/amd64-pc-solaris2.11/10.1.0/include-fixed/math.h:24,
                 from /vol/gcc-10/.11.4-gas-64/include/c++/10.1.0/bits/std_abs.h:40,
                 from /vol/gcc-10/.11.4-gas-64/include/c++/10.1.0/cstdlib:77,
                 from /vol/gcc-10/.11.4-gas-64/include/c++/10.1.0/bits/stl_algo.h:59,
                 from /vol/gcc-10/.11.4-gas-64/include/c++/10.1.0/algorithm:62,
                 from /vol/llvm/src/llvm-project/dist/llvm/include/llvm/ADT/Hashing.h:51,
                 from /vol/llvm/src/llvm-project/dist/llvm/include/llvm/ADT/Optional.h:18,
                 from /vol/llvm/src/llvm-project/dist/llvm/include/llvm/ADT/STLExtras.h:19,
                 from /vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCFrameLowering.h:15,
                 from /vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCSubtarget.h:16,
                 from /vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp:79:
/vol/gcc-10/.11.4-gas-64/lib/gcc/amd64-pc-solaris2.11/10.1.0/include-fixed/iso/math_iso.h:220:21: note: candidate: ‘long double std::sqrt(long double)’
  220 |  inline long double sqrt(long double __X) { return __sqrtl(__X); }
      |                     ^~~~
/vol/gcc-10/.11.4-gas-64/lib/gcc/amd64-pc-solaris2.11/10.1.0/include-fixed/iso/math_iso.h:186:15: note: candidate: ‘float std::sqrt(float)’
  186 |  inline float sqrt(float __X) { return __sqrtf(__X); }
      |               ^~~~
/vol/gcc-10/.11.4-gas-64/lib/gcc/amd64-pc-solaris2.11/10.1.0/include-fixed/iso/math_iso.h:74:15: note: candidate: ‘double std::sqrt(double)’
   74 | extern double sqrt __P((double));
      |               ^~~~

Can be fixed e.g. by casting EleNum to double.

Thanks for reporting. 7591d2103222d712b589600a7f7bc19683152bdd is committed