This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C).
ClosedPublic

Authored by craig.topper on Jun 29 2022, 11:29 AM.

Details

Summary

The sext_inreg can often be folded into an earlier instruction by
using a W instruction. The sext_inreg also works better with our ABI.

This is one of the steps to improving the generated code for this https://godbolt.org/z/hssn6sPco

Diff Detail

Event Timeline

craig.topper created this revision.Jun 29 2022, 11:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2022, 11:29 AM
craig.topper requested review of this revision.Jun 29 2022, 11:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2022, 11:29 AM
asb added a comment.EditedJun 29 2022, 11:36 AM

I think you need to push the commit with rv64i-shift-sext.ll?

Given this only deals with generic nodes, would this make sense to be a TLI-guided generic combine? (And maybe other targets could benefit from it, too, or already do this in target combines?)

Given this only deals with generic nodes, would this make sense to be a TLI-guided generic combine? (And maybe other targets could benefit from it, too, or already do this in target combines?)

Looks like X86 has something similar in combineShiftRightArithmetic, but they handle a more generic form.

// fold (ashr (shl, a, [56,48,32,24,16]), SarConst)                            
// into (shl, (sext (a), [56,48,32,24,16] - SarConst)) or                      
// into (lshr, (sext (a), SarConst - [56,48,32,24,16]))                        
// depending on sign of (SarConst - [56,48,32,24,16])

For RISC-V we also support sext_inreg for i8 and i16 with Zbb. I'm not sure we want to convert to it naively since it won't always fold away and zext.h/zext.b aren't compressible. It probablys makes sense to convert i8/i16 if the input is a load even without Zbb. But that probably needs to create a sextload instead of creating a sext_inreg.

I have another patch I'm working on that depends on this one. So I'd like to consider making this target independent as a future pass. I'll add a FIXME.

Add FIXME to make this generic.

I'm probably going to end up relaxing the one use check here to be that all users of the shl X, 32 are SRA instructions so that we can end up with a sext_inreg shared by multiple shls.

craig.topper edited the summary of this revision. (Show Details)Jun 29 2022, 1:06 PM

My plan had been to solve https://godbolt.org/z/hssn6sPco by adding a fold for (add (shl X, 32), C<<32) -> (shl (add X, C), 32) and then let this patch optimize the shift with sra that might be after it. I thought the (add (shl X, 32), C<<32) fold could stand on its own as well, but in testing I found that some values of C cause an infinite loop with isDesirableToCommuteWithShift.

The larger (sra (add (shl X, 32), C << 32), 32-C1) -> (shl (sext_inreg (add X, C), i32), C1) pattern is probably always profitable since the sext_inreg will always becomes an ADDW.

craig.topper retitled this revision from [RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (sra (sext_inreg X, i32), C). to [RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C)..Jun 29 2022, 2:52 PM
craig.topper added inline comments.
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8533

I typoed this comment

Fix mistake in comment

asb accepted this revision.Jun 30 2022, 7:21 AM

LGTM, thanks!

This revision is now accepted and ready to land.Jun 30 2022, 7:21 AM
This revision was landed with ongoing or failed builds.Jun 30 2022, 9:02 AM
This revision was automatically updated to reflect the committed changes.