This is an archive of the discontinued LLVM Phabricator instance.

[X86] Remove patterns for shift/rotate with immediate 1 and optimize during MC lowering
ClosedPublic

Authored by skan on May 8 2023, 3:18 AM.

Details

Summary

It's first suggested by @craig.topper in D150068. I think there are at least three pros

  1. This can reduce the patterns during ISEL, as a result, reducing the bytes in X86GenDAGISel.inc
  2. The patterns for shift/rotate with immediate 1 look quite similar to shift/rotate with immediate 8. So this can be seen as eliminating "duplicate" code.
  3. Delay the optimization from imm8 to imm1, so that the previous optimization passes do not need to handle the version of imm1

It improves fast isel code and makes X86DomainReassignment work for shifts by 1, but regressed global isel, though no one should care.

Diff Detail

Event Timeline

skan created this revision.May 8 2023, 3:18 AM
skan requested review of this revision.May 8 2023, 3:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 8 2023, 3:18 AM

What is the motivation here?

skan added a comment.May 8 2023, 4:27 AM

What is the motivation here?

It's first suggested by craig in D150068. I think there are at least three pros

  1. This can reduce the patterns during ISEL, as a result, reducing the bytes in X86GenDAGISel.inc
  2. The patterns for shift/rotate with immediate 1 look quite similar to shift/rotate with immediate 8. So this can be seen as eliminating "duplicate" code.
  3. Delay the optimization from imm8 to imm1, so that the previous optimization passes do not need to handle the version of imm1
craig.topper added inline comments.May 8 2023, 5:12 PM
llvm/test/tools/llvm-mca/X86/AlderlakeP/resources-x86_64.s
1518 ↗(On Diff #520315)

I think you need to let mayLoad = 0

craig.topper added inline comments.May 8 2023, 5:14 PM
llvm/test/tools/llvm-mca/X86/AlderlakeP/resources-x86_64.s
1518 ↗(On Diff #520315)

Err nevermind

craig.topper added inline comments.May 8 2023, 5:19 PM
llvm/test/tools/llvm-mca/X86/AlderlakeP/resources-x86_64.s
1518 ↗(On Diff #520315)

This should really be a different patch. This is an existing bug independent of whether removing the patterns gets approved.

What is the motivation here?

It's first suggested by craig in D150068. I think there are at least three pros

  1. This can reduce the patterns during ISEL, as a result, reducing the bytes in X86GenDAGISel.inc
  2. The patterns for shift/rotate with immediate 1 look quite similar to shift/rotate with immediate 8. So this can be seen as eliminating "duplicate" code.
  3. Delay the optimization from imm8 to imm1, so that the previous optimization passes do not need to handle the version of imm1

Yeah I thought since the code had been moved to a function it was easy to share it and get an isel table reduction.

Looks like it improved fast isel code and made X86DomainReassignment work for shifts by 1. But regressed global isel, though I guess no one should care.

skan marked 3 inline comments as done.May 8 2023, 6:52 PM

What is the motivation here?

It's first suggested by craig in D150068. I think there are at least three pros

  1. This can reduce the patterns during ISEL, as a result, reducing the bytes in X86GenDAGISel.inc
  2. The patterns for shift/rotate with immediate 1 look quite similar to shift/rotate with immediate 8. So this can be seen as eliminating "duplicate" code.
  3. Delay the optimization from imm8 to imm1, so that the previous optimization passes do not need to handle the version of imm1

Yeah I thought since the code had been moved to a function it was easy to share it and get an isel table reduction.

Looks like it improved fast isel code and made X86DomainReassignment work for shifts by 1. But regressed global isel, though I guess no one should care.

Yes, and thank you for the illustration! The code generated by global isel is far from expected in this test, so I agree no one should care.

llvm/test/tools/llvm-mca/X86/AlderlakeP/resources-x86_64.s
1518 ↗(On Diff #520315)

I thought so. But when I tried to remove the patterns only in the TD, these tests changed too.

If I did not add mayLoad =1, mayStore=1 explicitly for shift/rotate[m1], these instructions would show mayLoad=0, mayStore=0, hasUnModeledSideEffect=1 in these tests.

That's why I updated these bits. It's weird to me if I set mayLoad=1 for m1 version but set mayLoad=0 for mi version.

craig.topper added inline comments.May 8 2023, 8:05 PM
llvm/lib/Target/X86/X86InstrShiftRotate.td
365

I think this line is the one that affected the llvm-mca tests. You can go ahead and commit that and the the llvm-mca test changes.

skan updated this revision to Diff 520604.May 8 2023, 11:57 PM
skan marked 2 inline comments as done.

Address review comments: commit unrelated change first rGc60461e3f815

skan retitled this revision from [X86] Remove patterns for shift/rotate with immediate 1 and update side effect to [X86] Remove patterns for shift/rotate with immediate 1 and optimize during MC lowering.May 9 2023, 12:01 AM
skan edited the summary of this revision. (Show Details)
skan updated this revision to Diff 520627.May 9 2023, 2:07 AM

Remove more patterns

craig.topper added inline comments.May 9 2023, 8:31 PM
llvm/lib/Target/X86/X86MCInstLower.cpp
508

Missing if?

skan updated this revision to Diff 520896.May 9 2023, 8:42 PM
skan marked an inline comment as done.

Address review comments

llvm/lib/Target/X86/X86MCInstLower.cpp
508

Good catch! Done.

craig.topper added inline comments.May 11 2023, 11:55 AM
llvm/lib/Target/X86/MCTargetDesc/X86EncodingOptimization.cpp
139 ↗(On Diff #520896)

Why not check the opcode first?

skan updated this revision to Diff 521538.May 11 2023, 7:56 PM
skan marked an inline comment as done.

Address review comments: check opcode first in optimizeRotateWithImmediate

skan added a comment.May 15 2023, 6:53 PM

Friendly ping

craig.topper added inline comments.May 15 2023, 7:15 PM
llvm/lib/Target/X86/MCTargetDesc/X86EncodingOptimization.cpp
151 ↗(On Diff #521538)

Are the flags the same? I don't think we ever use the flags from rotate, but since we're treating this like almost an assembler optimization we should probably know if we'll break any future flag usages.

skan marked an inline comment as done.May 16 2023, 7:09 AM
skan added inline comments.
llvm/lib/Target/X86/MCTargetDesc/X86EncodingOptimization.cpp
151 ↗(On Diff #521538)

Are the flags the same? I don't think we ever use the flags from rotate, but since we're treating this like almost an assembler optimization we should probably know if we'll break any future flag usages.

Good point! The SPEC says

For ROL and ROR instructions, if
the masked count is 0, the flags are not affected. If the masked count is 1, then the OF flag is affected, otherwise
(masked count is greater than 1) the OF flag is undefined.

So they have different impact on the flags. I will revert the related changes.

skan updated this revision to Diff 522612.May 16 2023, 7:36 AM
skan marked an inline comment as done.

Revert incorrect change about ROL/ROR

This revision is now accepted and ready to land.May 16 2023, 12:50 PM
This revision was landed with ongoing or failed builds.May 17 2023, 4:55 AM
This revision was automatically updated to reflect the committed changes.
hans added a subscriber: hans.May 19 2023, 5:24 AM

skan mentioned this in rG2ef8ae134828: [X86] Remove patterns for ADC/SBB with immediate 8 and optimize during MC…

That caused assert in Chromium builds. See https://bugs.chromium.org/p/chromium/issues/detail?id=1446973#c2 for a reproducer.

hans added a comment.May 19 2023, 5:43 AM

skan mentioned this in rG2ef8ae134828: [X86] Remove patterns for ADC/SBB with immediate 8 and optimize during MC…

That caused assert in Chromium builds. See https://bugs.chromium.org/p/chromium/issues/detail?id=1446973#c2 for a reproducer.

I'll revert while that's investigated.

skan added a comment.EditedMay 19 2023, 7:32 AM

skan mentioned this in rG2ef8ae134828: [X86] Remove patterns for ADC/SBB with immediate 8 and optimize during MC…

That caused assert in Chromium builds. See https://bugs.chromium.org/p/chromium/issues/detail?id=1446973#c2 for a reproducer.

I'll revert while that's investigated.

@hans Thanks for the report! If you retested the reproducer before the revert, you would find it already passed after 5586bc539acb26cb94e461438de01a5080513401. So I reapplied the patch.

It failed before b/c the last operand of MCInst ADC/SBBri could be an expression instead of an immediate.

hans added a comment.May 19 2023, 7:56 AM

skan mentioned this in rG2ef8ae134828: [X86] Remove patterns for ADC/SBB with immediate 8 and optimize during MC…

That caused assert in Chromium builds. See https://bugs.chromium.org/p/chromium/issues/detail?id=1446973#c2 for a reproducer.

I'll revert while that's investigated.

@hans Thanks for the report! If you retested the reproducer before the revert, you would find it already passed after 5586bc539acb26cb94e461438de01a5080513401. So I reapplied the patch.

It failed before b/c the last operand of MCInst ADC/SBBri could be an expression instead of an immediate.

Sorry about that. I didn't realize 5586bc539acb26cb94e461438de01a5080513401 was a bug fix, and I obviously failed to re-run the test at head.