While clustering mem ops, AMDGPU target needs to consider number of clustered bytes
to decide on max number of mem ops that can be clustered. This patch adds support to pass
number of clustered bytes to target mem ops clustering logic.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Missing tests
llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp | ||
---|---|---|
160–167 | It would be better to not depend on the memory operands here, but this belongs in a helper function some kind of not (and this can also sink down to the use) |
llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp | ||
---|---|---|
160–167 | Hi @arsenm Did you mean here the helper function as a kind of below? unsigned getDstMemOperandSize(const MachineInstr *MI) const { if (!MI || MI->memoperands_empty()) return 0; return MI->memoperands().front()->getSize(); } And, use above helper function as below? unsigned WidthA = getDstMemOperandSize(CI.Last); unsigned WidthB = getDstMemOperandSize(&MI); |
There is NO functionality related changes in this patch. What kind of tests that you suggest here?
llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp | ||
---|---|---|
160–167 | Right, getMemOperandsWithOffset could be extended to return the width. Some targets already have an internal function getMemOperandsWithOffsetWidth which does that. | |
168–169 | The comment explains that we don't really want to limit the size of the cluster here, so it's probably best to pass in a small dummy value like 2 instead of WidthA + WidthB. |
Modified getMemOperandsWithOffset() to pass Width as an argument. Note that
SIInstrInfo::getMemOperandsWithOffset() needs to be extended to compute and
assign Width. It will be implemented in next patch.
getMemOperandsWithOffset() implemented by other targets needs compute Width too.
If it is not computing it, then, default computation is based on memoperands() api.
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
2044 | It's a bit silly that this function has "Width" in its name, and getMemOperandsWithOffset doesn't, but both of them take same Width argument. | |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
298–299 | Move this below the mayLoadOrStore check. Then, does getDstMemOperand ever fail? | |
llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp | ||
2972 | Doesn't AccessSize give you the width already? | |
llvm/lib/Target/X86/X86InstrInfo.cpp | ||
3700–3701 | Perhaps MemOp.hasOneMemOperand() would be more appropriate? (Or what should the behaviour be if there is more than one?) Please at least add a FIXME to get the width without relying on memoperands, or get an opinion from the X86 maintainers if this is OK. |
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
2044 | You are right, I somehow had missed it. Fixed it. | |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
298–299 | Fixed it. However, I do not know much about the functionality guarantee of mayLoadOrStore(). So for the purpose of safety, I have added an if stmt to check if DstOp is null or not, and return false if it is null. | |
llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp | ||
2972 | You are right, I somehow had missed it. Fixed it. | |
llvm/lib/Target/X86/X86InstrInfo.cpp | ||
3700–3701 | Yes, you are right. I have added a FIXME comment for now. I will get in touch with X86 maintainers and will get it fixed as soon as possible. |
One of the fixes for review comments resulted in one unit test failure, hence reverted back to original state.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
298–299 | Apparently, above if stmt resulted in one unit test failure. Hence reverted it back. And again, assuming here that DstOp will not be null is also NOT guaranteed, We seem to be getting null in many unit test cases, and hence assuming it non-null will again result in many test failures. Now there two possibilities |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
304–319 | I would prefer this logic to be folded into getMemOperandsWithOffsetWidth itself, where we already test for all the different instruction types (isMUBUF and so on). It might make sense for each of the cases in getMemOperandsWithOffsetWidth to set DstOp and then fall through to some common code at the end of the function that sets Width based on DstOp. | |
305 | Hopefully if you follow the advice above, you will find DstOp is never null (because it is only ever null here in cases where getMemOperandsWithOffsetWidth is going to return false anyway). If you do still find cases where DstOp is null then please debug them to see what has gone wrong. | |
2781–2786 | This code can be removed now. You should be able to use the Width values returned by getMemOperandsWithOffsetWidth instead. But you can leave that for a later patch if you prefer. |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
304–319 | In fact, I initially thought about it. However, first look at the code within getMemOperandsWithOffsetWidth() made me to feel on the other way around. I felt that this function requires clean-up, and hence did not want to touch it. Anyway, let me follow your suggestion. I will experiment and prepare a patch. | |
305 | I will hope for the best :) | |
2781–2786 | I would prefer to make this change in a separate patch, after the current patch is successfully merged. |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
305 | DstOp is null in many unit test-cases. I am not sure, what is going-on here. I need to debug it. It may take some time. I am not sure, if it is even worth considering the current situation and other P1 tickets at my hand. Anyway, let me see what best I can do here. |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
305 | Hi @foad Only above review comment to be taken care in the patch, and all other review comments are taken care. However, since we will not be going to use the width value until we will be able to come-up with a safe heuristic for mem ops cluster threshold, we can safely merge this patch, and in the next patch, we can address this review comment. That way, we no need to block this entire big patch to be in hold condition, and we do not need to deal with merge conflicts often and often. What do you say? |
Make sure that Width is properly computed within SIInstrInfo::getMemOperandsWithOffsetWidth()
Looks OK to me with some minor cleanups (see inline comments).
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
294 | Do you need this function at all? Can't you use RI.getOpSize instead? | |
299–301 | Could use RI.getRegClassForReg here. | |
302 | You don't need the outer parentheses here. | |
376 | You can remove some duplicated code by making this an "else", so there is only one "return true" for all BUF instructions. |
This won't correctly handle multiple mem operands