[ARM] Adjust AND immediates to make them cheaper to select.
ClosedPublic

Authored by efriedma on Mon, Jul 30, 5:18 PM.

Details

Summary

LLVM normally prefers to minimize the number of bits set in an AND immediate, but that doesn't always match the available ARM instructions. In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer a two-instruction sequence movs+ands or movs+bics where possible.

Some potential improvements outlined in ARMTargetLowering::targetShrinkDemandedConstant, but seems to work pretty well already.

The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX instruction due to a larger-than-expected mask. (It's orthogonal, in some sense, but as far as I can tell it's either impossible or nearly impossible to reproduce the bug without this change.)

According to my testing, this seems to consistently improve codesize by a small amount by forming bic more often for ISD::AND with an immediate.

Diff Detail

Repository
rL LLVM
efriedma created this revision.Mon, Jul 30, 5:18 PM

Hi Eli,

Would you mind adding a test case, perhaps similar to i16_cmpz, for the uxtb case? Also, I can see there's a test case for bic with -256, but I'm not sure about -2.. so could you also add that edge case as well?

thanks,

test/CodeGen/ARM/Windows/vla.ll
17 ↗(On Diff #158133)

Just a bug in the existing test case?

Sure, I can add a few more tests.

test/CodeGen/ARM/Windows/vla.ll
17 ↗(On Diff #158133)

The new transform prefers the immediate ~4 over the immediate ~7 because it has more bits set. They're equivalent here because the two low bits get shifted out anyway. Maybe the transform should be adjusted to prefer ~7?

samparker added inline comments.Wed, Aug 1, 3:33 AM
test/CodeGen/ARM/Windows/vla.ll
17 ↗(On Diff #158133)

Using the smaller immediate sounds fine to me, especially for T1. Some tests for using BIC instead of an AND with a const pool load would be good.

efriedma updated this revision to Diff 159108.Fri, Aug 3, 3:37 PM

Updated tests.

i16_cmpz specifically is a little weird... TargetLowering::SimplifySetCC has a transform which specifically impacts that case, but only if the immediate isn't legal. Maybe I can come up with something else to exercise the uxtb codepath.

samparker accepted this revision.Wed, Aug 8, 4:51 AM

Thanks for the extra tests, LGTM.

This revision is now accepted and ready to land.Wed, Aug 8, 4:51 AM
This revision was automatically updated to reflect the committed changes.

This change made some compilations run forever. Repro is available at http://martin.st/temp/VbrTag.preproc.c. Compiling with clang -target armv7-w64-mingw32 -c VbrTag.preproc.c -O2 doesn't complete within reasonable time.

Will file a proper PR later.

This change made some compilations run forever. Repro is available at http://martin.st/temp/VbrTag.preproc.c. Compiling with clang -target armv7-w64-mingw32 -c VbrTag.preproc.c -O2 doesn't complete within reasonable time.

Will file a proper PR later.

This is now PR38530.