This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][Isel] Avoid implicit zext for SIGN_EXTEND_INREG (TRUNCATE)
Needs ReviewPublic

Authored by guopeilin on Nov 15 2020, 7:17 PM.

Details

Summary

In some cases, AArch64 use SBFMXri + EXTRACT_SUBREG to implement "sign_extend_inreg (truncate)",
However COPY is usually erased after Register Coalescing, so if we want to zext this result,
we need to avoid using implicit zero-extend.

Diff Detail

Event Timeline

guopeilin created this revision.Nov 15 2020, 7:17 PM
guopeilin requested review of this revision.Nov 15 2020, 7:17 PM
t.p.northover added inline comments.Nov 17 2020, 1:53 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.h
459

Could we fix isBitfieldExtractOpFromSExtInReg to produce SBFMWri instead?

The other cases here seem to be nodes that we'd have to pessimize in the general case to ensure they zero the high bits on the off-chance they're needed, but for sign-extension it's just a quirk of the implementation.

llvm/test/CodeGen/AArch64/aarch64-avoid-implicit-zext-for-SIGN_EXTEND_INREG.ll
2 ↗(On Diff #305402)

Could you try to simplify this a bit more? There are lots of stray annotations, both calls appear to be unnecessary, and as far as I can tell %bf.load could just as easily be a function argument.

mdchen added a subscriber: mdchen.Nov 20 2020, 12:43 AM
guopeilin added inline comments.Nov 20 2020, 12:52 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.h
459

I guess the main job of isBitfieldExtractOpFromSExtInReg is trying to change Truncate + SBFMWri into SBFMXri + COPY, So I think if we fix it to produce SBFMWri, this function may become meaningless.
And, I think maybe it's ok to use a pattern like SBFMXri + COPY. The problem is that COPY is likely to be erased after Register Coalescing. And I found a patch that modifies the function shouleCoalesce() (https://reviews.llvm.org/D85956) to avoid coalescing COPY in some cases. However, this solution will cause performance regression and cannot fix this case.
So I think the key point is how to keep the operation that extract the lower bits after Register Coalescing, and I found this patch (https://reviews.llvm.org/D87771). So when we zero extend sign_exten_inreg, we need to look forward, if it comes from a truncate, then we should not use implicit zero extend.

llvm/test/CodeGen/AArch64/aarch64-avoid-implicit-zext-for-SIGN_EXTEND_INREG.ll
2 ↗(On Diff #305402)

Yes, these calls can be eliminated. However, with these calls, we can directly see the difference of runtime results between -O0 and -O3 after generating a binary program from this .ll file.
So I guess maybe it's ok to keep these calls, although it may look redundant.

t.p.northover added inline comments.Nov 20 2020, 1:26 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.h
459

I don't think that'''s what isBitfieldExtractOpFromSExtInReg is trying to do. Any COPY it results in would be far too implicit, and as you note completely unrelable. We just can't rely on it being preserved for correctness and shouldn't try.

Instead, I see a function that tries to select the correct SBFM variant, but is tripped up by the truncate detection code changing VT under it. N itself produces an i32 so SBFMWri is correct, but that's forgotten in the truncate case.

llvm/test/CodeGen/AArch64/aarch64-avoid-implicit-zext-for-SIGN_EXTEND_INREG.ll
2 ↗(On Diff #305402)

.ll tests aren't designed to be executed and we don't optimize them for that.

t.p.northover added inline comments.Nov 20 2020, 1:38 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.h
459

Oh hang on, I think I see what you mean. Let me think a bit more.

t.p.northover added inline comments.Nov 20 2020, 1:44 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.h
459

OK, I'm convinced, but please add a comment explaining why this is a special case, and probably mentioning isBitfieldExtractOpFromSExtInReg explicitly as the source.

guopeilin added inline comments.Nov 20 2020, 2:14 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.h
459

Ok, I will add this comment, and a simplifier case will be added later, please wait a bit more.

guopeilin added inline comments.Nov 25 2020, 5:18 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.h
459

a isBitfieldExtractOpFromSExtInReg related comment and a simpler case are added

aarch64-avoid-implicit-zext-for-SIGN_EXTEND_INREG.ll - please can you avoid using capital letters? Every so often it causes problems on windows dev machines.... (TBH its been better since the git transition).

guopeilin updated this revision to Diff 307938.Nov 26 2020, 5:20 PM

aarch64-avoid-implicit-zext-for-SIGN_EXTEND_INREG.ll - please can you avoid using capital letters? Every so often it causes problems on windows dev machines.... (TBH its been better since the git transition).

OK, the filename has been changed.

guopeilin added inline comments.Jan 4 2021, 1:34 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.h
459

If everything is ok, could you please approve this patch? Thanks a lot!

paulwalker-arm resigned from this revision.Apr 19 2021, 5:02 AM
guopeilin updated this revision to Diff 368847.Aug 26 2021, 4:41 AM
guopeilin retitled this revision from [AArch64][Isel] Avoid implicit zext for SIGN_EXTEND_INREG ( TRUNCATE ) to [AArch64][Isel] Avoid implicit zext for SIGN_EXTEND_INREG (TRUNCATE).
guopeilin edited the summary of this revision. (Show Details)

Pinging reviewers...

craig.topper added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.h
460

Should COPY here be EXTRACT_SUBREG? I think that’s what it is when isel creates it.

guopeilin updated this revision to Diff 369087.Aug 27 2021, 6:30 AM
guopeilin edited the summary of this revision. (Show Details)
guopeilin added inline comments.Aug 27 2021, 7:23 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.h
460

Thanks for reviewing, the comment is now modified.

I wonder we can extend this check to other basic blocks.

For example, if the trunc is in other basic block, there could be CopyFromReg instead of trunc in current basic block. Can we detect the trunc in this case please?

I wonder we can extend this check to other basic blocks.

For example, if the trunc is in other basic block, there could be CopyFromReg instead of trunc in current basic block. Can we detect the trunc in this case please?

Ideally, I think we should kill off isDef32, and run a pass after isel to clean up redundant extension operations. That would have fewer bugs, and we'd get cross-block optimization for free.

I wonder we can extend this check to other basic blocks.

For example, if the trunc is in other basic block, there could be CopyFromReg instead of trunc in current basic block. Can we detect the trunc in this case please?

Ideally, I think we should kill off isDef32, and run a pass after isel to clean up redundant extension operations. That would have fewer bugs, and we'd get cross-block optimization for free.

I agree with you. I feel we need custom peephole optimization pass for AArch64. If there is no working patch for it, I could start to implement it.