In some cases, AArch64 use SBFMXri + EXTRACT_SUBREG to implement "sign_extend_inreg (truncate)",
However COPY is usually erased after Register Coalescing, so if we want to zext this result,
we need to avoid using implicit zero-extend.
Details
Diff Detail
Event Timeline
llvm/lib/Target/AArch64/AArch64ISelLowering.h | ||
---|---|---|
436 | Could we fix isBitfieldExtractOpFromSExtInReg to produce SBFMWri instead? The other cases here seem to be nodes that we'd have to pessimize in the general case to ensure they zero the high bits on the off-chance they're needed, but for sign-extension it's just a quirk of the implementation. | |
llvm/test/CodeGen/AArch64/aarch64-avoid-implicit-zext-for-SIGN_EXTEND_INREG.ll | ||
3 | Could you try to simplify this a bit more? There are lots of stray annotations, both calls appear to be unnecessary, and as far as I can tell %bf.load could just as easily be a function argument. |
llvm/lib/Target/AArch64/AArch64ISelLowering.h | ||
---|---|---|
436 | I guess the main job of isBitfieldExtractOpFromSExtInReg is trying to change Truncate + SBFMWri into SBFMXri + COPY, So I think if we fix it to produce SBFMWri, this function may become meaningless. | |
llvm/test/CodeGen/AArch64/aarch64-avoid-implicit-zext-for-SIGN_EXTEND_INREG.ll | ||
3 | Yes, these calls can be eliminated. However, with these calls, we can directly see the difference of runtime results between -O0 and -O3 after generating a binary program from this .ll file. |
llvm/lib/Target/AArch64/AArch64ISelLowering.h | ||
---|---|---|
436 | I don't think that'''s what isBitfieldExtractOpFromSExtInReg is trying to do. Any COPY it results in would be far too implicit, and as you note completely unrelable. We just can't rely on it being preserved for correctness and shouldn't try. Instead, I see a function that tries to select the correct SBFM variant, but is tripped up by the truncate detection code changing VT under it. N itself produces an i32 so SBFMWri is correct, but that's forgotten in the truncate case. | |
llvm/test/CodeGen/AArch64/aarch64-avoid-implicit-zext-for-SIGN_EXTEND_INREG.ll | ||
3 | .ll tests aren't designed to be executed and we don't optimize them for that. |
llvm/lib/Target/AArch64/AArch64ISelLowering.h | ||
---|---|---|
436 | Oh hang on, I think I see what you mean. Let me think a bit more. |
llvm/lib/Target/AArch64/AArch64ISelLowering.h | ||
---|---|---|
436 | OK, I'm convinced, but please add a comment explaining why this is a special case, and probably mentioning isBitfieldExtractOpFromSExtInReg explicitly as the source. |
llvm/lib/Target/AArch64/AArch64ISelLowering.h | ||
---|---|---|
436 | Ok, I will add this comment, and a simplifier case will be added later, please wait a bit more. |
llvm/lib/Target/AArch64/AArch64ISelLowering.h | ||
---|---|---|
436 | a isBitfieldExtractOpFromSExtInReg related comment and a simpler case are added |
aarch64-avoid-implicit-zext-for-SIGN_EXTEND_INREG.ll - please can you avoid using capital letters? Every so often it causes problems on windows dev machines.... (TBH its been better since the git transition).
llvm/lib/Target/AArch64/AArch64ISelLowering.h | ||
---|---|---|
436 | If everything is ok, could you please approve this patch? Thanks a lot! |
llvm/lib/Target/AArch64/AArch64ISelLowering.h | ||
---|---|---|
437 | Should COPY here be EXTRACT_SUBREG? I think that’s what it is when isel creates it. |
llvm/lib/Target/AArch64/AArch64ISelLowering.h | ||
---|---|---|
437 | Thanks for reviewing, the comment is now modified. |
I wonder we can extend this check to other basic blocks.
For example, if the trunc is in other basic block, there could be CopyFromReg instead of trunc in current basic block. Can we detect the trunc in this case please?
Ideally, I think we should kill off isDef32, and run a pass after isel to clean up redundant extension operations. That would have fewer bugs, and we'd get cross-block optimization for free.
I agree with you. I feel we need custom peephole optimization pass for AArch64. If there is no working patch for it, I could start to implement it.
Could we fix isBitfieldExtractOpFromSExtInReg to produce SBFMWri instead?
The other cases here seem to be nodes that we'd have to pessimize in the general case to ensure they zero the high bits on the off-chance they're needed, but for sign-extension it's just a quirk of the implementation.