This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes
ClosedPublic

Authored by hsmhsm on May 26 2020, 3:38 AM.

Details

Summary

While clustering mem ops, AMDGPU target needs to consider number of clustered bytes
to decide on max number of mem ops that can be clustered. This patch adds support to pass
number of clustered bytes to target mem ops clustering logic.

Diff Detail

Event Timeline

hsmhsm created this revision.May 26 2020, 3:38 AM
hsmhsm edited the summary of this revision. (Show Details)May 26 2020, 3:39 AM
hsmhsm edited the summary of this revision. (Show Details)
hsmhsm updated this revision to Diff 266157.May 26 2020, 3:52 AM

Taken care of clang formatting.

Missing tests

llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
160–167

It would be better to not depend on the memory operands here, but this belongs in a helper function some kind of not (and this can also sink down to the use)

hsmhsm marked an inline comment as done.May 26 2020, 8:55 AM
hsmhsm added inline comments.
llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
160–167

Hi @arsenm

Did you mean here the helper function as a kind of below?

unsigned getDstMemOperandSize(const MachineInstr *MI) const {
  if (!MI || MI->memoperands_empty())
    return 0;

  return MI->memoperands().front()->getSize();
}

And, use above helper function as below?

unsigned WidthA = getDstMemOperandSize(CI.Last);
unsigned WidthB = getDstMemOperandSize(&MI);

Missing tests

There is NO functionality related changes in this patch. What kind of tests that you suggest here?

hsmhsm updated this revision to Diff 266255.May 26 2020, 9:49 AM

Have taken care of Matt's review comments

arsenm added inline comments.May 26 2020, 10:20 AM
llvm/lib/CodeGen/MachineScheduler.cpp
1576–1577

This won't correctly handle multiple mem operands

llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
160–167

Yes, but it's worse to rely on the memory operands here than getting this from the instruction opcode / operand

hsmhsm updated this revision to Diff 266428.May 26 2020, 10:39 PM

Have taken care of second set of review comments (by Matt)

hsmhsm updated this revision to Diff 266429.May 26 2020, 10:57 PM

Replace few if stmts with ternary expression

foad added inline comments.May 27 2020, 1:42 AM
llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
160–167

Right, getMemOperandsWithOffset could be extended to return the width. Some targets already have an internal function getMemOperandsWithOffsetWidth which does that.

168–169

The comment explains that we don't really want to limit the size of the cluster here, so it's probably best to pass in a small dummy value like 2 instead of WidthA + WidthB.

hsmhsm updated this revision to Diff 266491.May 27 2020, 4:57 AM

Modified getMemOperandsWithOffset() to pass Width as an argument. Note that
SIInstrInfo::getMemOperandsWithOffset() needs to be extended to compute and
assign Width. It will be implemented in next patch.

hsmhsm updated this revision to Diff 266515.May 27 2020, 6:28 AM

getMemOperandsWithOffset() implemented by other targets needs compute Width too.
If it is not computing it, then, default computation is based on memoperands() api.

hsmhsm updated this revision to Diff 266538.May 27 2020, 7:37 AM

Compute Width within SIInstrInfo::getMemOperandsWithOffset()

hsmhsm marked 4 inline comments as done.May 27 2020, 9:01 AM
hsmhsm added inline comments.
llvm/lib/CodeGen/MachineScheduler.cpp
1576–1577

Taken care

llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
160–167

Taken care

160–167

Taken care

168–169

Taken care

PING (since it is bit of high priority)

foad added inline comments.May 28 2020, 2:57 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2044

It's a bit silly that this function has "Width" in its name, and getMemOperandsWithOffset doesn't, but both of them take same Width argument.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
298–299

Move this below the mayLoadOrStore check. Then, does getDstMemOperand ever fail?

llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp
2972

Doesn't AccessSize give you the width already?

llvm/lib/Target/X86/X86InstrInfo.cpp
3700–3701

Perhaps MemOp.hasOneMemOperand() would be more appropriate? (Or what should the behaviour be if there is more than one?)

Please at least add a FIXME to get the width without relying on memoperands, or get an opinion from the X86 maintainers if this is OK.

hsmhsm updated this revision to Diff 266844.May 28 2020, 6:25 AM

Have taken care of review comments made by Jay.

hsmhsm marked 4 inline comments as done.May 28 2020, 6:32 AM
hsmhsm added inline comments.
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2044

You are right, I somehow had missed it. Fixed it.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
298–299

Fixed it. However, I do not know much about the functionality guarantee of mayLoadOrStore(). So for the purpose of safety, I have added an if stmt to check if DstOp is null or not, and return false if it is null.

llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp
2972

You are right, I somehow had missed it. Fixed it.

llvm/lib/Target/X86/X86InstrInfo.cpp
3700–3701

Yes, you are right. I have added a FIXME comment for now. I will get in touch with X86 maintainers and will get it fixed as soon as possible.

hsmhsm updated this revision to Diff 266867.May 28 2020, 7:08 AM

One of the fixes for review comments resulted in one unit test failure, hence reverted back to original state.

hsmhsm marked an inline comment as done.May 28 2020, 7:17 AM
hsmhsm added inline comments.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
298–299

Apparently, above if stmt resulted in one unit test failure. Hence reverted it back. And again, assuming here that DstOp will not be null is also NOT guaranteed, We seem to be getting null in many unit test cases, and hence assuming it non-null will again result in many test failures. Now there two possibilities
(1) getDstMemOperand() might not be handling all the cases OR
(2) getDstMemOperand() is handling all the cases, but, it is expected that DstOp will be null in few cases even though, mayLoadOrStore() is true.

foad added inline comments.May 29 2020, 5:55 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
304–319

I would prefer this logic to be folded into getMemOperandsWithOffsetWidth itself, where we already test for all the different instruction types (isMUBUF and so on). It might make sense for each of the cases in getMemOperandsWithOffsetWidth to set DstOp and then fall through to some common code at the end of the function that sets Width based on DstOp.

305

Hopefully if you follow the advice above, you will find DstOp is never null (because it is only ever null here in cases where getMemOperandsWithOffsetWidth is going to return false anyway).

If you do still find cases where DstOp is null then please debug them to see what has gone wrong.

2781–2786

This code can be removed now. You should be able to use the Width values returned by getMemOperandsWithOffsetWidth instead. But you can leave that for a later patch if you prefer.

hsmhsm marked 3 inline comments as done.May 29 2020, 6:20 AM
hsmhsm added inline comments.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
304–319

In fact, I initially thought about it. However, first look at the code within getMemOperandsWithOffsetWidth() made me to feel on the other way around. I felt that this function requires clean-up, and hence did not want to touch it. Anyway, let me follow your suggestion. I will experiment and prepare a patch.

305

I will hope for the best :)

2781–2786

I would prefer to make this change in a separate patch, after the current patch is successfully merged.

hsmhsm marked an inline comment as done.May 29 2020, 7:22 AM
hsmhsm added inline comments.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
305

DstOp is null in many unit test-cases. I am not sure, what is going-on here. I need to debug it. It may take some time. I am not sure, if it is even worth considering the current situation and other P1 tickets at my hand. Anyway, let me see what best I can do here.

hsmhsm marked an inline comment as done.May 30 2020, 6:14 AM
hsmhsm added inline comments.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
305

Hi @foad

Only above review comment to be taken care in the patch, and all other review comments are taken care. However, since we will not be going to use the width value until we will be able to come-up with a safe heuristic for mem ops cluster threshold, we can safely merge this patch, and in the next patch, we can address this review comment. That way, we no need to block this entire big patch to be in hold condition, and we do not need to deal with merge conflicts often and often. What do you say?

hsmhsm updated this revision to Diff 267628.Jun 1 2020, 8:24 AM

Make sure that Width is properly computed within SIInstrInfo::getMemOperandsWithOffsetWidth()

foad accepted this revision.Jun 1 2020, 9:12 AM

Looks OK to me with some minor cleanups (see inline comments).

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
294

Do you need this function at all? Can't you use RI.getOpSize instead?

299–301

Could use RI.getRegClassForReg here.

302

You don't need the outer parentheses here.

376

You can remove some duplicated code by making this an "else", so there is only one "return true" for all BUF instructions.

This revision is now accepted and ready to land.Jun 1 2020, 9:12 AM

Thanks Jay, I will take care of these minor comments in a next immediate patch.

This revision was automatically updated to reflect the committed changes.