This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Fold SRA of a load into a narrower sign-extending load
ClosedPublic

Authored by bjope on Jan 10 2022, 4:35 AM.

Details

Summary

An sra is basically sign-extending a narrower value. Fold away the
shift by doing a sextload of a narrower value, when it is legal to
reduce the load width accordingly.

Diff Detail

Event Timeline

bjope created this revision.Jan 10 2022, 4:35 AM
bjope requested review of this revision.Jan 10 2022, 4:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2022, 4:35 AM
bjope updated this revision to Diff 398770.Jan 10 2022, 4:02 PM

Fixup comment that had been edited by mistake.

Nice transform! Might be worth including an IR proof in the description to make it clearer:
https://alive2.llvm.org/ce/z/yYuasA

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12149

The various shift amount variables are difficult to follow. This is really a question for the existing code - can we make it clearer (either through renaming or code comments) how "ShAmt", "ShiftAmt" and "ShLeftAmt" are different?

If we can improve that, is it possible to make a small helper/lambda, so we can share the bailout conditions for SRA/SRL instead of duplicating code?

llvm/test/CodeGen/X86/combine-sra-load.ll
73–76

This comment is not accurate. We are replicating (splatting) the sign bit of the loaded i16 across 32-bits, so there's still a shift.
In IR, instcombine would transform this into:

define i32 @sra_of_sextload_no_fold(i16* %p) {
  %load = load i16, i16* %p, align 2
  %1 = ashr i16 %load, 15
  %shift = sext i16 %1 to i32
  ret i32 %shift
}
bjope added inline comments.Jan 11 2022, 6:19 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12149

Yes. I thought the old names were kind of confusing. So I added the SRA separately. I'll make a separate patch, trying to improve the existing code as well.

One reason I did not share things with SRL (I started off trying to do it that way) is that the solution for SRL seemed to be divided into two parts. I believe the second part (that includes the hasOneUse guard) can be triggered also as being nestled inside an AND (N pointing at the AND and N0 being the SRL). Although, we can probably do some code sharing for the first part regardless of that, at least now when I understand that the unexplained "N0 = SDValue(N, 0)" only should be done only for SRL and not for SRA.

llvm/test/CodeGen/X86/combine-sra-load.ll
73–76

Right, the comment is supposed to say "so we can't fold away the shift".

bjope updated this revision to Diff 399296.Jan 12 2022, 5:27 AM

Rebased upon https://reviews.llvm.org/D117104

The new parent revision contains some refactoring of the old SRL folds, as well as updated code comments.

bjope updated this revision to Diff 400322.Jan 15 2022, 2:27 PM

Rebased.

Does this need to be rebased after rG46cacdbb21c2?

Looks a lot nicer now - @spatel any more comments?

spatel accepted this revision.Jan 24 2022, 6:06 AM

Does this need to be rebased after rG46cacdbb21c2?

Looks a lot nicer now - @spatel any more comments?

LGTM. This part is relatively straightforward now - assuming we got the preceding steps right (might want to give that a little time to bake in the wild).

This revision is now accepted and ready to land.Jan 24 2022, 6:06 AM
bjope added a comment.Jan 24 2022, 6:29 AM

@spatel / @RKSimon : Thanks for all the feedback!
A rebase should be trivial (it's already rebased on the patch that updated the code common with SRL). I'll wait until tomorrow before landing this part.

This revision was landed with ongoing or failed builds.Jan 25 2022, 3:28 AM
This revision was automatically updated to reflect the committed changes.