An sra is basically sign-extending a narrower value. Fold away the
shift by doing a sextload of a narrower value, when it is legal to
reduce the load width accordingly.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Nice transform! Might be worth including an IR proof in the description to make it clearer:
https://alive2.llvm.org/ce/z/yYuasA
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
12142 | The various shift amount variables are difficult to follow. This is really a question for the existing code - can we make it clearer (either through renaming or code comments) how "ShAmt", "ShiftAmt" and "ShLeftAmt" are different? If we can improve that, is it possible to make a small helper/lambda, so we can share the bailout conditions for SRA/SRL instead of duplicating code? | |
llvm/test/CodeGen/X86/combine-sra-load.ll | ||
74–75 | This comment is not accurate. We are replicating (splatting) the sign bit of the loaded i16 across 32-bits, so there's still a shift. define i32 @sra_of_sextload_no_fold(i16* %p) { %load = load i16, i16* %p, align 2 %1 = ashr i16 %load, 15 %shift = sext i16 %1 to i32 ret i32 %shift } |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
12142 | Yes. I thought the old names were kind of confusing. So I added the SRA separately. I'll make a separate patch, trying to improve the existing code as well. One reason I did not share things with SRL (I started off trying to do it that way) is that the solution for SRL seemed to be divided into two parts. I believe the second part (that includes the hasOneUse guard) can be triggered also as being nestled inside an AND (N pointing at the AND and N0 being the SRL). Although, we can probably do some code sharing for the first part regardless of that, at least now when I understand that the unexplained "N0 = SDValue(N, 0)" only should be done only for SRL and not for SRA. | |
llvm/test/CodeGen/X86/combine-sra-load.ll | ||
74–75 | Right, the comment is supposed to say "so we can't fold away the shift". |
Rebased upon https://reviews.llvm.org/D117104
The new parent revision contains some refactoring of the old SRL folds, as well as updated code comments.
Does this need to be rebased after rG46cacdbb21c2?
Looks a lot nicer now - @spatel any more comments?
LGTM. This part is relatively straightforward now - assuming we got the preceding steps right (might want to give that a little time to bake in the wild).
clang-format not found in user’s local PATH; not linting file.