This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size
ClosedPublic

Authored by hsmhsm on Jun 23 2020, 10:07 AM.

Details

Summary

Make use of both the - (1) clustered bytes and (2) cluster length, to decide on
the max number of mem ops that can be clustered. On an average, when loads
are dword or smaller, consider 5 as max threshold, otherwise 4. This
heuristic is purely based on different experimentation conducted, and there is
no analytical logic here.

Diff Detail

Event Timeline

hsmhsm created this revision.Jun 23 2020, 10:07 AM
hsmhsm added a comment.EditedJun 23 2020, 10:13 AM

This patch was earlier got reviewed, accepted, and committed via https://reviews.llvm.org/D81085. But, I had to revert it because of the reasons updated in https://reviews.llvm.org/D81085. Now, those blocking issues are closed via https://reviews.llvm.org/D81649. But, meanwhile, few testcases which are updated in this patch, got changed, and conflicted. Hence, I again had to fix those test cases, and had to open this new revisoin.

This revision is now accepted and ready to land.Jun 23 2020, 11:57 AM
This revision was automatically updated to reflect the committed changes.