This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] extend PPCPreIncPrep Pass for ds/dq form
ClosedPublic

Authored by shchenz on Sep 2 2019, 9:51 PM.

Details

Summary

Now, PPCPreIncPrep pass changes a loop to update form and update all load/store with same base accordingly.

We can do more for load/store with same base, for example, convert load/store with same base to ds/dq form. Like:

Generically, this means transforming loops like this:
for (int i = 0; i < n; ++i) {
  unsigned long x1 = *(unsigned long *)(p + i + 5);
 unsigned long x2 = *(unsigned long *)(p + i + 9);
}
to look like this:
unsigned NewP = p + 5;
for (int i = 0; i < n; ++i) {
    unsigned long x1 = *(unsigned long *)(i + NewP);
    unsigned long x2 = *(unsigned long *)(i + NewP + 4);
}

Diff Detail

Event Timeline

shchenz created this revision.Sep 2 2019, 9:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 2 2019, 9:51 PM

@hfinkel Hi Hal, could you help to have a glance at this patch and give comment about if it is the right direction to implement such transform based on PPCPreIncPrep pass you have implemented. Thanks a lot.

shchenz updated this revision to Diff 218402.Sep 2 2019, 9:56 PM

delete incorrectly added file

shchenz added a comment.EditedSep 9 2019, 7:15 AM

@hfinkel Hi Hal, sorry for pushing you on this issue. Could you give some comments for this patch? As I tested on Power9, the patch gains 9.6% on cpu2017 benchmark exchanges, 2.0% on benchmark x264 for base rate, for peak rate, gains 2.1% on cactuBSSN and 1.9% on xalancbmk. Thanks.

@hfinkel Hi Hal, sorry for pushing you on this issue. Could you give some comments for this patch? As I tested on Power9, the patch gains 9.6% on cpu2017 benchmark exchanges, 2.0% on benchmark x264 for base rate, for peak rate, gains 2.1% on cactuBSSN and 1.9% on xalancbmk. Thanks.

Sorry for the delay. Yes, this looks like a reasonable approach. It will be easier to review, however, if you post the patch without the file renaming (even if you do the renaming on commit) so that it's easier to see what's changing.

shchenz updated this revision to Diff 219486.Sep 10 2019, 1:00 AM

revert file name changing to get a better diff shown.

Thanks for your confirm @hfinkel .
I am still working on the final patch. I will request code review later after I work out final patch.

shchenz planned changes to this revision.Sep 10 2019, 5:09 AM
shchenz edited the summary of this revision. (Show Details)Oct 9 2019, 8:17 PM
shchenz added reviewers: jsji, nemanjai, Restricted Project.
shchenz updated this revision to Diff 224239.Oct 9 2019, 8:19 PM

Final patch is made. Could you help to review? Thanks a lot.

gentle ping

jsji added a comment.Nov 4 2019, 2:09 PM

A few questions first.

llvm/lib/Target/PowerPC/PPCLoopPreIncPrep.cpp
85

We are also doing DS/DQ now, why we still keep the max-vars the same as before? Shall we increase it a little bit?

Looks like NO according to original comments?
So, maybe we should still keep original comments about which is a little over half of the allocatable register set, so that it will be clearer about why we choose 16 here.

90

Yes, 9 < 16, but why 9 here?

91

Why we have 3/3/3 for each type ? Should they be treated equally? Do you have any statistics to support the decision here?

139

Maybe add comment here about why we want to assign 4, 16 in enum.

170

Typo: Update not Updata

llvm/test/CodeGen/PowerPC/swaps-le-1.ll
167–178

This testcase was broken, see https://reviews.llvm.org/D69733 for a fix.
We may need to rebase after it.

jsji requested changes to this revision.Nov 5 2019, 1:49 PM

Overall looks promising!

llvm/lib/Target/PowerPC/PPCLoopPreIncPrep.cpp
105

If would be not profitable if one common base -> It would not be profitable if the common base?

106

can choose -> should already be able to

107

Set default to minmal profitable value 2 and make it as a option. -> Set minimal profitable value default to 2 and make it as an option.

207

There is no parameter of DispConstraint. Describe InstrFrom instead.

364

ReminderOffsetInfo -> RemainderOffsetInfo for all places below.

368

first BucketElement index with a reminder key... -> the index of first BucketElement whose remainder is equal to key.

386

Maybe use FIXME instead of TODO here to be consistent.
base select strategy-> base selection strategy.

412

Why we return true when no update needed? We don't need to rewrite anything.

507

also be a update form -> also be an update form

508

if the increasing step is a multipler of 4 -> if the stride is a multiple of 4 ?

511

Why we would like to always prefer UpdateForm?
What if I would like to disable UpdateForm and only do DS/DQ form ?

Also even if we prefer UpdateForm, do we need to consider the MaxVarsUpdateForm here? Or else, we may actually generate more update form than expected?

This revision now requires changes to proceed.Nov 5 2019, 1:49 PM
shchenz marked 19 inline comments as done.Nov 6 2019, 12:38 AM

Very appriciate for your reviewing.

llvm/lib/Target/PowerPC/PPCLoopPreIncPrep.cpp
412

Sorry for confusion comment. We still need to rewrite the load/stores based on the first value. Here I mean we don't need to adjust the elements in BucketChain, since first value is most profitable and all elements already substract first value when collecting them.

511

This is a good question. Yes, for my understanding, I always try to use update form since we can save one addi, no explicit addi to loop iteration variable. But as you said, it will replace DS Form with Update Form. Two issues we meet:
issue1: counter DSFormChainRewritten/UpdateFormChainRewritten are not accurate.
issue2: Update form number is not under control of MaxVarsUpdateForm as MaxVarsUpdateForm is only used when collect update candidates.

issue1 is fixed.
Add an option PreferUpdateForm to control should we choose update form if one chain is both ds and update form.
If this option is false, no issue for now. All three forms have no interact.
If this option is true, we still meet issue2. The original reason we add MaxVarsUpdateForm and MaxVarsDSForm is to prevent from inserting too many PHIs and since we already use this limit in collecting candidates, so I guess no matter we choose update or ds, no impact to the total PHI numbers, One more PHI in update form, but one less PHI in ds form?
Default value for PreferUpdateForm is true.

llvm/test/CodeGen/PowerPC/swaps-le-1.ll
167–178

Yeah, Thanks for reminder, I will rebase after that patch gets committed.

shchenz updated this revision to Diff 228007.Nov 6 2019, 12:41 AM
shchenz marked 2 inline comments as done.

address @jsji comments

shchenz updated this revision to Diff 228997.Nov 12 2019, 6:54 PM

rebase to pick up the commit for https://reviews.llvm.org/D69733

jsji accepted this revision as: jsji.Nov 12 2019, 7:32 PM

LGTM.

llvm/lib/Target/PowerPC/PPCLoopPreIncPrep.cpp
240

Should change the name here, we are doing more than pre-inc. now.

440

TODO: -> FIXME:

514

Why not move this down to line 522?

515

Seems non-relevant change, why moving this after BasePtrIncSCEV check?

518

multipler -> multiple

559

Why not init it to nullptr as well?

587

Can we add some comments about why we need to create new GEP at the end of every predecessor BB here?

jsji accepted this revision.Nov 12 2019, 7:32 PM
This revision is now accepted and ready to land.Nov 12 2019, 7:32 PM
shchenz marked 6 inline comments as done.Nov 17 2019, 6:32 PM

Thanks for your reviewing.

llvm/lib/Target/PowerPC/PPCLoopPreIncPrep.cpp
240

Yes, there will be another NFC patch to rename the pass and file name.

This revision was automatically updated to reflect the committed changes.