This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][IGLP] Parameterize the SchedGroup processing / linking in Solver
ClosedPublic

Authored by jrbyrnes on Apr 27 2023, 4:35 PM.

Details

Summary

Currently the PipelineSolver processes SchedGroups in BOTTOM_UP manner. However, there is no compelling reason to require this. Providing the option for TOP_DOWN affords greater experimentation capability, and make usage a bit more intuitive. Importantly, it makes designing rules much easier.

Diff Detail

Event Timeline

jrbyrnes created this revision.Apr 27 2023, 4:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2023, 4:35 PM
jrbyrnes requested review of this revision.Apr 27 2023, 4:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2023, 4:35 PM
jrbyrnes updated this revision to Diff 517934.Apr 28 2023, 8:23 AM

Actually demonstrate TOP_DOWN direction

kerbowa added inline comments.May 8 2023, 9:42 AM
llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
838

Is this strategy just a demo of this new feature? What are your thoughts on upstreaming this?

jrbyrnes added inline comments.May 8 2023, 10:17 AM
llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
838

Hey -- thanks for bringing that up.

It is to demo the new feature -- put another way, it is to facilitate lit testing of the new feature.

The assumption is that clients of iglp_opt have an intimate understanding of the i32s passed to the builtin. However, I see that mixing testing and features is less than ideal.

Maybe it makes the most sense to have incremental tests (under iglp_opt(1)) that is superseded by the singlewave strategy which incorporates all the new features?

Otherwise, we will need a mechanism to test new features without changing the existing behavior of the current builtins -- whether it be under iglp_opt or a separate intrinsic.

jrbyrnes updated this revision to Diff 520741.May 9 2023, 10:23 AM

Use iglp_opt(1) (to be superseded by MFMASmallGemmSingleWaveOpt) for testing

Ping -- worth mentioning: I believe that we should bring this feature in even if not convinced of the iglp_opt(1) implementation (ditto for rules). The caveat to that statement, though, is the approach to testing.

kerbowa added inline comments.May 25 2023, 10:36 AM
llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
258

Could this just be a boolean value?

399

Combine the two LLVM_DEBUG macros?

415–416

Are we able to still use iterators here?
auto I = IsBottomUp ? SyncPipeline.rbegin() : SyncPipeline.begin();
auto E = IsBottomUp ? SyncPipeline.rend() : SyncPipeline.end();

425–426

I don't think we should remove the comment explaining MakePred.

536

Just use iterators?

674–675

Ditto.

793

Does this really need to be virtual if the derived class is just setting a flag to up/down? What special handling may we need here for different strategies beyond just setting the flag in the constructor?

jrbyrnes updated this revision to Diff 525750.May 25 2023, 12:30 PM
jrbyrnes marked 8 inline comments as done.

Review comments

llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
415–416

This doesn't work out of the box. SmallVector::iterator (T *) isn't compatible with SmallVector::reverse_iterator (std::reverse_iterator<T*>). I can template this code based on the iterator, but I thought the current implementation was the cleanest way to do it (unfortunately). We can switch to ranges in c++20. Please correct me if I'm missing something.

kerbowa added inline comments.May 25 2023, 2:04 PM
llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
415–416

Oh right. Could you use two different for_each?
f = loop(GroupA.link(GroupB)
if (IsBottomUp)

for_each(rbegin(), rend(), f)

else

for_each(begin(), end(), f)
jrbyrnes updated this revision to Diff 526147.May 26 2023, 11:45 AM

Template against iterators (as per offline discussion)

kerbowa accepted this revision.May 30 2023, 8:53 AM

LGTM, thanks!

llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
802

Remove

This revision is now accepted and ready to land.May 30 2023, 8:53 AM