X86 and AArch64 expand it as libcall inside the target. And PowerPC also want to expand them as libcall for P8. So, I implement this in the legalizer and remove the code for X86/AArch64 to avoid the duplicate code.
Details
- Reviewers
craig.topper RKSimon spatel efriedma ebevhan uweigand t.p.northover - Group Reviewers
Restricted Project - Commits
- rGebdd20f430c4: Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
15,730 ms | x64 debian > LLVM.Bindings/Go::go.test | |
80 ms | x64 windows > LLVM.CodeGen/XCore::threads.ll |
Event Timeline
llvm/include/llvm/CodeGen/TargetLowering.h | ||
---|---|---|
1058 ↗ | (On Diff #304765) | Maybe limit the scope to something like isSupportedCastOperation? |
X86 uses Custom action based on the output type and checks the input type in the Custom handler. If the input type and output type are legal the custom handler returns the SDValue that was passed to the custom handler. That tells LegalizeDAG that no custom handling was required and it should be considered legal. Would that work for your case?
I think not. The way why X86 works is that, return SDValue() for custom handler means legal. But what PowerPC wants is to expand it. We didn't have good way to tell the legalizer to expand it when we are inside the custom handler as far as I know...
Return SDValue() means Expand. Returning the original op means Legal. Returning any other node means custom handling was done. X86 does all 3.
I have to reopen this revision as the custom way still cannot meet PowerPC's requirement, because it will try to expand it first. From the implementation, it seems that we are trying our best to expand the node without checking the legalization of the expanded node, which results in several runtime call eventually. And sometimes, the expand never fall into libcall for some opcode ... See
case ISD::SINT_TO_FP: case ISD::STRICT_SINT_TO_FP: Tmp1 = ExpandLegalINT_TO_FP(Node, Tmp2); Results.push_back(Tmp1); if (Node->isStrictFPOpcode()) Results.push_back(Tmp2); break;
What we are trying to do is to call the runtime call for such operations.
Address Reviewer's comments.
Yeah if you want to emit a libcall that is an issue. AArch64 and X86 both worked around this by emitting libcalls directly from Custom handling. See the callers of LowerF128Call in both targets. But this change might be better.
Thank you for the information. I will try to remove those code from AArch64 and X86 and use the way I did in D91757 to common all the code.
I know that Simon mentioned that perhaps we should limit this to cast operations but TBH, I think there may be value in making this general and passing in the SDNode itself rather than opcode/type(s). Then all of this logic can be greatly simplified into something like this:
void SelectionDAGLegalize::LegalizeOp(SDNode *Node) { LLVM_DEBUG(dbgs() << "\nLegalizing: "; Node->dump(&DAG)); if (TLI.opRequiresLibCall(Node)) ConvertNodeToLibcall(Node);
And presumably, there would be another function for the TLI to provide the RTLIB::Libcall for the specific SDNode.
And of course, this needs some tests.
llvm/include/llvm/CodeGen/TargetLowering.h | ||
---|---|---|
1081 ↗ | (On Diff #305485) | The semantics here are rather subtle and I don't think the name describes the semantics adequately. virtual bool castRequiresLibCall(unsigned Op, EVT SrcVT, EVT DestVT) const { return false; } And then of course, flip the ternary operator above. |
Thank you for Nemanjai's comments. I have changed it as suggested as it is indeed more clean, but with a different name.
I have removed all the places that call the LowerF128Call Craig mentioned for AArch64 and X86 and do it inside legalizer in an uniform way. However, I see some test change with this and I am not sure if it is expected change due to the different implementation of expanding the libcall. Can someone please help me double confirm this ?
I will rebase D91757 for powerpc side change.
It would appear to me that the only test case changes are that the libcalls were turned into tail calls - but of course my knowledge of X86 and ARM asm is next to zero. Presumably the original lowering didn't set the necessary flags on the call and now the legalizer does so.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
3038 | A return after llvm_unreachable? Does the build compiler warn about no return here? | |
llvm/test/CodeGen/AArch64/arm64-fp128.ll | ||
7 | These appear to have been turned into tail calls with no stack frame. That seems like a positive sideeffect but an ARM/AArch64 expert should certainly confirm that. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
3038 | No... Also, there is already such kind of code. See SDValue AArch64TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const { LLVM_DEBUG(dbgs() << "Custom lowering: "); LLVM_DEBUG(Op.dump()); switch (Op.getOpcode()) { default: llvm_unreachable("unimplemented operand"); return SDValue(); |
Add the PowerPC side support. I have verify the functionality with PowerPC and it works well. Could someone from AArch64 or X86 verify the functionality for this patch as I see some test change though it looks good (I didn't have the X86 and AArch64 env) ?
Address comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp | ||
---|---|---|
4375 | Ah, the evil for copy-paste. Yes, it is legal here. I will update the comments. For PowerPC, i1 is legal type, so, we need this part of code to extend it to the type that has libcall. | |
4509 | Good catch. I will update it with cast. In fact, the type legalizer didn't skip this extra operand which seems wrong. See SoftenFloatRes_FP_ROUND(). We can fix it later I think. |
Could we check whether the input type is fp128 at the begining of PPCTargetLowering::LowerFP_TO_INT and if it is a fp128 input type, we directly ConvertNodeToLibcall?
Besides, I found a strange thing:
In PPCTargetLowering::LowerFP_TO_INT, we will return original op if the input type is fp128, which means in some subtarget , fp128's fptoint is legal and have corresponding instructions.
But what you are going to do is to do a hack and expand it to libcall at the beginning of legalizeOp(...). It seems there is a conflict here.
If I understand correctly, the situation is:
- fptoint is legal on subtarget power9, but needs libcall to implement on power8 or earlier.
- We can't set fptoint as Expand on power8, since in this case Expand won't fallthrough to libcall.
- ConvertNodeToLibcall can't be referenced in target-specific lowering.
- Besides, the framework only decides the action of fptoint by return-value type. But things like 'double to i32' doesn't need libcall on power8.
I am not sure if this is feasible as ConvertNodeToLibcall is for legalizer not target lower.
Besides, I found a strange thing: In PPCTargetLowering::LowerFP_TO_INT, we will return original op if the input type is fp128, which means in some subtarget , fp128's fptoint is legal and have corresponding instructions. But what you are going to do is to do a hack and expand it to libcall at the beginning of legalizeOp(...). It seems there is a conflict here.
That is something that we did to fix the limitation of current setOperationAction mechanism and I am fine if we have any better solution for this... In fact, they are not conflict as the hook means that, expand it as libcall without querying the operation action as the operation action cannot handle the case that there are more than one input type.
We used to handle this by having it as custom and inside the hook, different the behavior by different return value. But we have no way to do something like this:
Legalize the sint_to_fp as libcall if the source type is fp128.
Any thoughts ?
The Expand path falls back to Libcall if the expand fails. So if you return SDValue() from the custom handler, what does the expand for sint_to_fp try to right now for your fp128 cases?
I know do know the Expand for FP_EXTEND will never fail and tries to do a stack memory operation. But maybe we're missing a extload legality check there that we could use to make it fail.
Yeah, we can make FP_EXTEND work if check the legality. For ExpandLegalINT_TO_FP(), it will try to use the stack memory operation to convert the int to f64 first, then, extend it to f128. We also can check the legality of the FPRound/FPExtend to make it fail. Then, we will hit assertion: assert(!isSigned && "Legalize cannot Expand SINT_TO_FP for i64 yet"); So, we can make it fail to if it is not signed. Then, get the final step to have a complicate algorithm to do the convert. But we can make it fail if check the legality.
This is the first attempt I made before but hit too many issues and I feel that the expand will try best to expand it without checking the legality. If we have concern on opening such a backdoor for legalization action, I can update the patch to add the legality check to make the expand fail.
Make the expand failed for fp_to_int/int_to_fp/fp_round/fp_extend as well as the strict counterpart.
As we have the time pressure for this, if there is concern from other target about this patch, I will abandon this patch and do it inside PowerPC as what AArch64/X86 did now. Personally, I like the idea to common them into legalizer.
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp | ||
---|---|---|
4397 | Good catch. |
clang-tidy: warning: invalid case style for variable 'dl' [readability-identifier-naming]
not useful