The sext_inreg can often be folded into an earlier instruction by
using a W instruction. The sext_inreg also works better with our ABI.
This is one of the steps to improving the generated code for this https://godbolt.org/z/hssn6sPco
Differential D128843
[RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C). craig.topper on Jun 29 2022, 11:29 AM. Authored by
Details The sext_inreg can often be folded into an earlier instruction by This is one of the steps to improving the generated code for this https://godbolt.org/z/hssn6sPco
Diff Detail
Event TimelineComment Actions Given this only deals with generic nodes, would this make sense to be a TLI-guided generic combine? (And maybe other targets could benefit from it, too, or already do this in target combines?) Comment Actions Looks like X86 has something similar in combineShiftRightArithmetic, but they handle a more generic form. // fold (ashr (shl, a, [56,48,32,24,16]), SarConst) // into (shl, (sext (a), [56,48,32,24,16] - SarConst)) or // into (lshr, (sext (a), SarConst - [56,48,32,24,16])) // depending on sign of (SarConst - [56,48,32,24,16]) For RISC-V we also support sext_inreg for i8 and i16 with Zbb. I'm not sure we want to convert to it naively since it won't always fold away and zext.h/zext.b aren't compressible. It probablys makes sense to convert i8/i16 if the input is a load even without Zbb. But that probably needs to create a sextload instead of creating a sext_inreg. I have another patch I'm working on that depends on this one. So I'd like to consider making this target independent as a future pass. I'll add a FIXME. Comment Actions I'm probably going to end up relaxing the one use check here to be that all users of the shl X, 32 are SRA instructions so that we can end up with a sext_inreg shared by multiple shls. Comment Actions My plan had been to solve https://godbolt.org/z/hssn6sPco by adding a fold for (add (shl X, 32), C<<32) -> (shl (add X, C), 32) and then let this patch optimize the shift with sra that might be after it. I thought the (add (shl X, 32), C<<32) fold could stand on its own as well, but in testing I found that some values of C cause an infinite loop with isDesirableToCommuteWithShift. The larger (sra (add (shl X, 32), C << 32), 32-C1) -> (shl (sext_inreg (add X, C), i32), C1) pattern is probably always profitable since the sext_inreg will always becomes an ADDW.
|
I typoed this comment