This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] common chains to reuse offsets to reduce register pressure
ClosedPublic

Authored by shchenz on Aug 25 2021, 11:06 PM.

Details

Summary
//    common multiple chains for the load/stores with same offsets in the loop, 
//    so that we can reuse the offsets and reduce the register pressure in the                                                                         
//    loop. This transformation can also increase the loop ILP as now each chain
//    uses its own loop induction add/addi. But this will increase the number of
//    add/addi in the loop.                                                     
//
//    char *p;                                                                  
//    A1 = p + base1                                                                                                                                   
//    A2 = p + base1 + offset                                                   
//    B1 = p + base2                                                            
//    B2 = p + base2 + offset                                                   
//                                                                              
//    for (int i = 0; i < n; i++)                                               
//      unsigned long x1 = *(unsigned long *)(A1 + i);                          
//      unsigned long x2 = *(unsigned long *)(A2 + i)                           
//      unsigned long x3 = *(unsigned long *)(B1 + i);                          
//      unsigned long x4 = *(unsigned long *)(B2 + i);                          
//    }                                                                         
//                                                                              
//    to look like this:                                                        
//                                                                                  
//    A1_new = p + base1 // chain 1                                                 
//    B1_new = p + base2 // chain 2, now inside the loop, common offset is          
//                       // reused.                                                 
//                                                                                  
//    for (long long i = 0; i < n; i+=count) {                                      
//      unsigned long x1 = *(unsigned long *)(A1_new + i);                          
//      unsigned long x2 = *(unsigned long *)((A1_new + i) + offset);               
//      unsigned long x3 = *(unsigned long *)(B1_new + i);                          
//      unsigned long x4 = *(unsigned long *)((B1_new + i) + offset);               
//    }

Found some improvements for our internal benchmarks.

Diff Detail

Event Timeline

shchenz created this revision.Aug 25 2021, 11:06 PM
shchenz requested review of this revision.Aug 25 2021, 11:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2021, 11:06 PM
qiucf added a subscriber: qiucf.Aug 26 2021, 12:18 AM
qiucf added inline comments.
llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
128

Better as EnableChainCommoning?

shchenz updated this revision to Diff 370162.Sep 1 2021, 10:15 PM
shchenz marked an inline comment as done.

address @qiucf comments and rebase

shchenz added inline comments.Sep 1 2021, 10:17 PM
llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
128

Good idea.

shchenz planned changes to this revision.Sep 22 2021, 2:22 AM

For a better review, I will split the patch into two patches:
1: NFC patch for the function outlining, like rewriteForBase and rewriteForBucketElement;
2: functionality change for chain common

jsji added inline comments.Oct 20 2021, 2:13 PM
llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
66

Can we avoid using ' in examples?

432

treat it with? treat it as?

507

Why this can not reuse collectCandidates?

513

We already collected both Dform and DQform buckets before, can we reuse some info to do early exit?

541

TotalCount? ReusableOffsetCount?

542

FirstGroupCount? NewGroupCount?

544

SawSeperator? SawDifferentOffset?

546

Do we have assumption about the order of Element offset? Can we add comments

556

How about 2 reusable offset in 20 offsets? Maybe we should introduce a heuristic to control whether we want to skip the bucket? eg: if the ratio of reusable is very low?

559

// Only one offset?

560

Why sqrt(EleNum)?

579

This is common to 552, can we just us different ChainNum w w/o SawSeperator?

583

Why we need this check here?

620

Can we introduce an option to control, and just set 4 to be the default?

Also, why we can't push this check into prepareBasesForCommoningChains?

628

Can we loop through BBChanged only?

645

How often we will get unsafe offsets? Can we just do early exit in the main loop below?

663

pistart? Can we use a more meaningful name.

shchenz updated this revision to Diff 381169.Oct 21 2021, 1:20 AM
shchenz marked 16 inline comments as done.

address @jsji comments

Thanks for your review. @jsji Updated accordingly.

llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
513

Good idea. We can share the info among all forms. I split this into a new patch D112196.

556

I added a new threshold MaxVarsChainCommon in the new patch which will prevent from commoning too many chains.

560

The minimal for (EleNum/x + x) (1 <= x <= EleNum and x is the chain number) should be when x == sqrt(EleNum)?

579

Maybe we can revisit this comment after I add more comments to this function to make it clear.

583

Suppose there are two chains, we need to make sure, each diff between two adjacent elements is the same among chains.
For example,

after we divided the bucket to two chains A {baseA, offset1, offset2, offset3, offset4}, B {baseB, offset5, offset6, offset7, offset8},
we need to make sure offset1 = offset5, offset2 == offset6, offset3 = offset7, offset4 = offset8.

So we can use two bases and 4 reused offsets.

628

yes, I think we can and we should. I will post another patch for this NFC change as this usage is also existing in updateFormPrep and dispFormPrep

645

I am not sure how often we get unsafe offset. Before we expand a SCEV, it is safe to call this function first. If necessary, I will do more investigation later to confirm for what kind of case, isSafeToExpandis needed

jsji added inline comments.Oct 21 2021, 8:32 AM
llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
67

offset is reused ? common offset is re-used?

71

Should we use ((A1_new + i) + offset ) to be clearer?

422

isChainCommoningCandidate , to be aligned with isDSFormCandidate etc.

488

Why we have this limit check here, but not in addOneCandidate?

Also we should at least emit some debug log if we are really discarding candidate due to bucket limit here.

491

This should be checked in collectCandiate, as isValidCandidate.

500

Can we also abstract this check so that we can also reuse addOneCandidate to do this.

Something like:

for (auto &B : Buckets) {
   const SCEV *Diff = SE->getMinusSCEV(LSCEV, B.BaseSCEV);
   if ( const auto *Diff = isValidDiff(Diff, LSCEV, B.BaseSCEV)) {
     B.Elements.push_back(BucketElement(Diff, MemI));
     FoundBucket = true;
     break;
   }
 }

then we use isValidConstantDiff for others, but isValidChainCommoningDiff for this.

isValidConstantDiff(Diff, LSCEV, B.BaseSCEV) ={
return dyn_cast<SCEVConstant>(Diff)
}

isValidChainCommoningDiff(Diff, LSCEV, B.BaseSCEV) ={

  assert(LSCEV && "Invalid SCEV for Ptr value.");

  // Don't mess up previous dform prepare.
  if (isa<SCEVConstant>(LSCEV))
    return null_ptr;

  // A single integer type offset.
  if (isa<SCEVUnknown>(LSCEV) && LSCEV->getType()->isIntegerTy())
    return Diff;

  const SCEVNAryExpr *ASCEV = dyn_cast<SCEVNAryExpr>(LSCEV);
  if (!ASCEV)
    return null_ptr;

  for (const SCEV *Op : ASCEV->operands())
    if (!Op->getType()->isIntegerTy())
      return null_ptr;

  return Diff;


}
513

D112196 looks good. But I think we can still do more for this? eg: If both Dform and DQform buckets are empty, then there is no point trying to run this commoning either?

585

How about this?

unsigned ChainNum=FirstOffsetReusedCount / FirstOffsetReusedCountInFirstChain;
if (!SawChainSeparater) 
  ChainNum = (unsigned)sqrt(EleNum);

 CBucket.ChainSize = (unsigned)(EleNum / ChainNum);

 // If this is not a perfect chain(eg: all elements can be put inside
 // commoning chains.), skip now.
 if (CBucket.ChainSize * ChainNum != EleNum)
     return false;

 if (SawChainSeparater) {
 // Check that the offset seqs are the same for all chains?
     for (unsigned i = 1; i < CBucket.ChainSize; i++)
         for (unsigned j = 1; j < ChainNum; j++)
            if (CBucket.Elements[i].Offset !=
                SE->getMinusSCEV(CBucket.Elements[i + j * CBucket.ChainSize].Offset,
                          CBucket.Elements[j * CBucket.ChainSize].Offset))
            return false;

  }

 for (unsigned i = 0; i < ChainNum; i++)
   CBucket.ChainBases.push_back(CBucket.Elements[i * CBucket.ChainSize]);
603

Not all elements?

607

// Check that the offset seqs are the same for all chains?

858

Since we are reusing collectCandidates, this check can be moved out to around line 1488.

882

Why we can't pass isValidChainCommoningCandidate as isValidCandate (yes, need to add one more LSCEV parameter to function prototype) , and pass`addOneCandidateForChainCommoning` as addOneCandiate function too?

Then we don't need the IsChainCommoning bool parameter and conditional check here.

jsji added inline comments.Oct 21 2021, 8:39 AM
llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
500
isValidChainCommoningDiff(Diff, LSCEV, B.BaseSCEV) ={


  assert(LSCEV && "Invalid SCEV for Ptr value.");

  if (cast<SCEVAddRecExpr>(B.BaseSCEV)->getStepRecurrence(*SE) !=
        cast<SCEVAddRecExpr>(LSCEV)->getStepRecurrence(*SE))
      return null_ptr;

  // Don't mess up previous dform prepare.
  if (isa<SCEVConstant>(LSCEV))
    return null_ptr;

  // A single integer type offset.
  if (isa<SCEVUnknown>(LSCEV) && LSCEV->getType()->isIntegerTy())
    return Diff;

  const SCEVNAryExpr *ASCEV = dyn_cast<SCEVNAryExpr>(LSCEV);
  if (!ASCEV)
    return null_ptr;

  for (const SCEV *Op : ASCEV->operands())
    if (!Op->getType()->isIntegerTy())
      return null_ptr;

  return Diff;


}
shchenz updated this revision to Diff 381492.Oct 22 2021, 2:40 AM
shchenz marked 14 inline comments as done.
shchenz edited the summary of this revision. (Show Details)

address @jsji comments

shchenz added inline comments.Oct 22 2021, 2:54 AM
llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
513

Not exactly, ds/dq forms require the diff between elements must be constants, but we can prepare for non-constant diffs for chain commoning.

jsji accepted this revision as: jsji.Oct 22 2021, 7:52 AM

LGTM. Thanks for working on this.

llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
154

Address clang-format warning please.

559

Can we name this different from line 1019? both are "loopprepare"?

830

This is a new check for updateform/d/dqform, make sure it is intended.

This revision is now accepted and ready to land.Oct 22 2021, 7:52 AM
This revision was landed with ongoing or failed builds.Oct 24 2021, 9:22 PM
This revision was automatically updated to reflect the committed changes.
shchenz marked 3 inline comments as done.

Thanks for your review. @jsji

llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp
830

Yes, this is required for chain common but not needed for update/d/dqform rewrite.

ro added a subscriber: ro.Oct 29 2021, 2:01 AM

This patch broke the Solaris build:

/vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp: In member function ‘bool {anonymous}::PPCLoopInstrFormPrep::prepareBasesForCommoningChains({anonymous}::Bucket&)’:
/vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp:497:37: error: call of overloaded ‘sqrt(unsigned int&)’ is ambiguous
  497 |     ChainNum = (unsigned)sqrt(EleNum);
      |                                     ^
In file included from /vol/gcc-10/.11.4-gas-64/lib/gcc/amd64-pc-solaris2.11/10.1.0/include-fixed/math.h:24,
                 from /vol/gcc-10/.11.4-gas-64/include/c++/10.1.0/bits/std_abs.h:40,
                 from /vol/gcc-10/.11.4-gas-64/include/c++/10.1.0/cstdlib:77,
                 from /vol/gcc-10/.11.4-gas-64/include/c++/10.1.0/bits/stl_algo.h:59,
                 from /vol/gcc-10/.11.4-gas-64/include/c++/10.1.0/algorithm:62,
                 from /vol/llvm/src/llvm-project/dist/llvm/include/llvm/ADT/Hashing.h:51,
                 from /vol/llvm/src/llvm-project/dist/llvm/include/llvm/ADT/Optional.h:18,
                 from /vol/llvm/src/llvm-project/dist/llvm/include/llvm/ADT/STLExtras.h:19,
                 from /vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCFrameLowering.h:15,
                 from /vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCSubtarget.h:16,
                 from /vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp:79:
/vol/gcc-10/.11.4-gas-64/lib/gcc/amd64-pc-solaris2.11/10.1.0/include-fixed/iso/math_iso.h:220:21: note: candidate: ‘long double std::sqrt(long double)’
  220 |  inline long double sqrt(long double __X) { return __sqrtl(__X); }
      |                     ^~~~
/vol/gcc-10/.11.4-gas-64/lib/gcc/amd64-pc-solaris2.11/10.1.0/include-fixed/iso/math_iso.h:186:15: note: candidate: ‘float std::sqrt(float)’
  186 |  inline float sqrt(float __X) { return __sqrtf(__X); }
      |               ^~~~
/vol/gcc-10/.11.4-gas-64/lib/gcc/amd64-pc-solaris2.11/10.1.0/include-fixed/iso/math_iso.h:74:15: note: candidate: ‘double std::sqrt(double)’
   74 | extern double sqrt __P((double));
      |               ^~~~

Can be fixed e.g. by casting EleNum to double.

In D108750#3095793, @ro wrote:

This patch broke the Solaris build:

/vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp: In member function ‘bool {anonymous}::PPCLoopInstrFormPrep::prepareBasesForCommoningChains({anonymous}::Bucket&)’:
/vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp:497:37: error: call of overloaded ‘sqrt(unsigned int&)’ is ambiguous
  497 |     ChainNum = (unsigned)sqrt(EleNum);
      |                                     ^
In file included from /vol/gcc-10/.11.4-gas-64/lib/gcc/amd64-pc-solaris2.11/10.1.0/include-fixed/math.h:24,
                 from /vol/gcc-10/.11.4-gas-64/include/c++/10.1.0/bits/std_abs.h:40,
                 from /vol/gcc-10/.11.4-gas-64/include/c++/10.1.0/cstdlib:77,
                 from /vol/gcc-10/.11.4-gas-64/include/c++/10.1.0/bits/stl_algo.h:59,
                 from /vol/gcc-10/.11.4-gas-64/include/c++/10.1.0/algorithm:62,
                 from /vol/llvm/src/llvm-project/dist/llvm/include/llvm/ADT/Hashing.h:51,
                 from /vol/llvm/src/llvm-project/dist/llvm/include/llvm/ADT/Optional.h:18,
                 from /vol/llvm/src/llvm-project/dist/llvm/include/llvm/ADT/STLExtras.h:19,
                 from /vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCFrameLowering.h:15,
                 from /vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCSubtarget.h:16,
                 from /vol/llvm/src/llvm-project/dist/llvm/lib/Target/PowerPC/PPCLoopInstrFormPrep.cpp:79:
/vol/gcc-10/.11.4-gas-64/lib/gcc/amd64-pc-solaris2.11/10.1.0/include-fixed/iso/math_iso.h:220:21: note: candidate: ‘long double std::sqrt(long double)’
  220 |  inline long double sqrt(long double __X) { return __sqrtl(__X); }
      |                     ^~~~
/vol/gcc-10/.11.4-gas-64/lib/gcc/amd64-pc-solaris2.11/10.1.0/include-fixed/iso/math_iso.h:186:15: note: candidate: ‘float std::sqrt(float)’
  186 |  inline float sqrt(float __X) { return __sqrtf(__X); }
      |               ^~~~
/vol/gcc-10/.11.4-gas-64/lib/gcc/amd64-pc-solaris2.11/10.1.0/include-fixed/iso/math_iso.h:74:15: note: candidate: ‘double std::sqrt(double)’
   74 | extern double sqrt __P((double));
      |               ^~~~

Can be fixed e.g. by casting EleNum to double.

Thanks for reporting. 7591d2103222d712b589600a7f7bc19683152bdd is committed