Page MenuHomePhabricator

[DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses
ClosedPublic

Authored by RKSimon on May 14 2022, 6:12 AM.

Details

Summary

If we're using shift pairs to mask, then relax the one use limit if the shift amounts are equal - we'll only be generating a single AND node.

AArch64 has a couple of regressions due to this, so I've enforced the existing one use limit inside a AArch64TargetLowering::shouldFoldConstantShiftPairToMask callback.

Part of the work to fix the regressions in D77804

Diff Detail

Event Timeline

RKSimon created this revision.May 14 2022, 6:12 AM
RKSimon requested review of this revision.May 14 2022, 6:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2022, 6:12 AM
RKSimon retitled this revision from [DAG] Fold (shl (srl x, c), c) -> and(x, m) even if x has other uses to [DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses.May 14 2022, 9:02 AM

RISC-V changes LGTM.

SystemZ part LGTM.

@foad @dmgreen Any objections to the AMDGPU / ARM changes?

Do you know what makes the AArch64::shouldFoldConstantShiftPairToMask necessary? Is it just something about those tests, or something fundamental to the architecture?

foad added a comment.May 17 2022, 2:46 AM

@foad @dmgreen Any objections to the AMDGPU / ARM changes?

AMDGPU looks OK to me. Using a literal 0xffff0000 operand increases code size, but on the other had some sequences have one fewer instruction which is nice.

Do you know what makes the AArch64::shouldFoldConstantShiftPairToMask necessary? Is it just something about those tests, or something fundamental to the architecture?

There were a number of test regressions - some of the UXTB matching is affected again (I think this was mainly in a followup patch that I'm working on for an equivalent (srl (shl x, c1), c2)) fold) and some load multiple / calling convention tests were messed up.

Shall I update the patch showing the regressions for comparison?

I'd be happy to address these in a followup patch which will relax/remove the AArch64TargetLowering::shouldFoldConstantShiftPairToMask limit again - I'm currently thinking/hoping using this callback in ARM/AArch64 will help with the UXTB regressions I keep hitting.

dmgreen accepted this revision.May 17 2022, 5:00 AM

I think AArch64 should be able to do the And efficiently in most cases. There is an instcombine equivalent fold that doesn't check one-use, so from that perspective it will only be things that come up from DAG combine that this alters. I see two win vararg tests changing, with code that comes from AArch64TargetLowering::LowerWindowsDYNAMIC_STACKALLOC.

It seems to be only better because the use happens to be a sub instruction that can include the shift for "free". I'm not sure if that is a great reason, from the architecture perspective, to block the transform (unlike Thumb1 where the And cannot easily be done, as in D55630). But perhaps the shifted instruction is a good enough justification.

I'm not a huge fan of the override for aarch64, but as this is unlikely to come up from IR, it sounds like it should be fine. It is unlikely to be worth trying to implement something that is a more precise for this particular case.

So LGTM.

This revision is now accepted and ready to land.May 17 2022, 5:00 AM